• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Help required to read binary file

 
Nina Milo
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am trying to read a binary file in java , which i get from a vendor who uses C++ to generate these files. So these files are ofcourse in little endian format.
Have been successful in reading some parts of this file but one part of file as 32bytes Char data. I am not sure how to read all 32bytes of char. I know DataInputStream provides method char readChar() , which is 2 bytes of data. So, do i need to loop reading 2 bytes for 16 times and adding up all those bytes? Also do i need to re-order the bytes if using reahChar() ?
I don't know if its good approach or not . Please provide any suggestions. Thanks for your time.





 
Henry Wong
author
Marshal
Pie
Posts: 21021
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nina Milo wrote:
... which i get from a vendor who uses C++ to generate these files. So these files are ofcourse in little endian format.


Whether the format is little or big endian is more related to the processor that the program has been compiled for (or compile flags and libraries). You can't make the conclusion that it is little endian because your vendor used C++. You need to confirm the format with your vendor.

Nina Milo wrote:
Have been successful in reading some parts of this file but one part of file as 32bytes Char data. I am not sure how to read all 32bytes of char.


The format of char should also be confirmed. In most cases, 32 bytes of char means 32 characters as each char is 8 bit ASCII -- but regardless, confirm it.

Also, you may also want to look into the java.nio.ByteBuffer class, as this class have methods to change the endian, if needed.

Henry
 
Nina Milo
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry.
Its confirmed the file is little endian format. As you said char 32 byte description in file is - ASCII character string representation of a
NULL-terminated string of 32 bytes. This field must consist of 31 numeric characters followed by a NULL char (0x00).

The file format:
#pragma pack(1)
struct DATA
{
unsigned long CODE; (32 bits)
unsigned short STACKER;
unsigned long COUNT;
unsigned short MONTH;
unsigned short DAY;
unsigned short YEAR;
unsigned short HOUR;
unsigned short MINUTE;
unsigned short SECOND;
char RUN_CODE[32];
};
#pragma pack

I have completed reading the entire data except char using datainputstream.
example for first field (32 bits length) i read:


So, My question is how do i read 32byte char data. Is it ok if i use above approach to readunsignedbyte for 8 times and then shift all the bytes. Please correct me if i am wrong .
 
Ireneusz Kordal
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nina Milo wrote:
So, My question is how do i read 32byte char data. Is it ok if i use above approach to readunsignedbyte for 8 times and then shift all the bytes. Please correct me if i am wrong .

I guess that this is probably string (chars) encoded in UTF-32LE format (LE means 'little endian').
If this is a case, read the whole data in a byte buffer (array of bytes), then create ByteArrayInputStream using this buffer,
and read data from the ByteArrayInputStream using InputStreamReader passing 'UTF-32' to its constructor.
The conversion from UTF-32LE to java unicode will be automatic.
Here is a small example:

You can use UTF-32BE for big endian.
 
Rob Spoor
Sheriff
Pie
Posts: 20514
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nina Milo wrote:#pragma pack(1)
struct DATA
{
unsigned long CODE; (32 bits)
unsigned short STACKER;
unsigned long COUNT;
unsigned short MONTH;
unsigned short DAY;
unsigned short YEAR;
unsigned short HOUR;
unsigned short MINUTE;
unsigned short SECOND;
char RUN_CODE[32];
};
#pragma pack

A few notes:
- unsigned long has no matching type in Java. If you know the value will not be larger than Long.MAX_VALUE you can still use long, otherwise BigInteger is a better choice.
- a C char is one Java byte, so to read 32 C chars you simply need to read 32 Java bytes and store these. The index of the first NULL character ('\0' or simply 0) can then be used to construct a String:
 
Nina Milo
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for all your responses. Rob, i considered your reply and was able to get the 32bytes of char data .
My code snipped used:


Now i have one more question regarding the performance of read/write functions for a File. In reality whats the performance of reading and writing ASCII/ Binary files.
I felt handling binary files are little slower because of reading and reversing the bytes (endian format) will take more time compared to ASCII file. Please clarify this?

Thanks.
 
Henry Wong
author
Marshal
Pie
Posts: 21021
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nina Milo wrote:
I felt handling binary files are little slower because of reading and reversing the bytes (endian format) will take more time compared to ASCII file. Please clarify this?


Does it really matter? If the file is in the wrong format, and needs extra processing, it's not like you have the option to not do the processing?

BTW, from looking at your code that reads in the characters, and loading it into a string, I don't see any endian processing. It looks like the endian-ness of the file, and of the machine (JVM) are the same.

Henry
 
Nina Milo
Greenhorn
Posts: 19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry. I appreciate your answer the code snippet i have wriiten has nothing do with endian processing. I am working on something else which does requires the conversion from big endian to little endian and vice versa.
I wanted to know the I/O performance in general. Which is faster in reading/writing among Ascii and binary files?
 
Henry Wong
author
Marshal
Pie
Posts: 21021
78
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nina Milo wrote:Thanks Henry. I appreciate your answer the code snippet i have wriiten has nothing do with endian processing. I am working on something else which does requires the conversion from big endian to little endian and vice versa.
I wanted to know the I/O performance in general. Which is faster in reading/writing among Ascii and binary files?


In general, I/O is slower than the CPU. This is a weird thing to say because these are two different things, and arguably, not even comparable. What is meant by this, is that when I/O is being done, the CPU is waiting for the data -- and the waiting is an order of magnitude slower... example, in the time to wait for the bytes, the CPU could have spent hundreds, perhaps thousands of cycles, on each byte (if it didn't have to wait).

So... yes, doing a bit order flip to each byte does take more time (one or two cycles via a lookup table). However, you'll probably not even notice it, if you take into the account the time to fetch the bytes from the file too.

Henry
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic