File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Help required to read binary file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "Help required to read binary file" Watch "Help required to read binary file" New topic
Author

Help required to read binary file

Nina Milo
Greenhorn

Joined: Jul 29, 2008
Posts: 19
Hi,
I am trying to read a binary file in java , which i get from a vendor who uses C++ to generate these files. So these files are ofcourse in little endian format.
Have been successful in reading some parts of this file but one part of file as 32bytes Char data. I am not sure how to read all 32bytes of char. I know DataInputStream provides method char readChar() , which is 2 bytes of data. So, do i need to loop reading 2 bytes for 16 times and adding up all those bytes? Also do i need to re-order the bytes if using reahChar() ?
I don't know if its good approach or not . Please provide any suggestions. Thanks for your time.





Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

Nina Milo wrote:
... which i get from a vendor who uses C++ to generate these files. So these files are ofcourse in little endian format.


Whether the format is little or big endian is more related to the processor that the program has been compiled for (or compile flags and libraries). You can't make the conclusion that it is little endian because your vendor used C++. You need to confirm the format with your vendor.

Nina Milo wrote:
Have been successful in reading some parts of this file but one part of file as 32bytes Char data. I am not sure how to read all 32bytes of char.


The format of char should also be confirmed. In most cases, 32 bytes of char means 32 characters as each char is 8 bit ASCII -- but regardless, confirm it.

Also, you may also want to look into the java.nio.ByteBuffer class, as this class have methods to change the endian, if needed.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Nina Milo
Greenhorn

Joined: Jul 29, 2008
Posts: 19
Thanks Henry.
Its confirmed the file is little endian format. As you said char 32 byte description in file is - ASCII character string representation of a
NULL-terminated string of 32 bytes. This field must consist of 31 numeric characters followed by a NULL char (0x00).

The file format:
#pragma pack(1)
struct DATA
{
unsigned long CODE; (32 bits)
unsigned short STACKER;
unsigned long COUNT;
unsigned short MONTH;
unsigned short DAY;
unsigned short YEAR;
unsigned short HOUR;
unsigned short MINUTE;
unsigned short SECOND;
char RUN_CODE[32];
};
#pragma pack

I have completed reading the entire data except char using datainputstream.
example for first field (32 bits length) i read:


So, My question is how do i read 32byte char data. Is it ok if i use above approach to readunsignedbyte for 8 times and then shift all the bytes. Please correct me if i am wrong .
Ireneusz Kordal
Ranch Hand

Joined: Jun 21, 2008
Posts: 423
Nina Milo wrote:
So, My question is how do i read 32byte char data. Is it ok if i use above approach to readunsignedbyte for 8 times and then shift all the bytes. Please correct me if i am wrong .

I guess that this is probably string (chars) encoded in UTF-32LE format (LE means 'little endian').
If this is a case, read the whole data in a byte buffer (array of bytes), then create ByteArrayInputStream using this buffer,
and read data from the ByteArrayInputStream using InputStreamReader passing 'UTF-32' to its constructor.
The conversion from UTF-32LE to java unicode will be automatic.
Here is a small example:

You can use UTF-32BE for big endian.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

Nina Milo wrote:#pragma pack(1)
struct DATA
{
unsigned long CODE; (32 bits)
unsigned short STACKER;
unsigned long COUNT;
unsigned short MONTH;
unsigned short DAY;
unsigned short YEAR;
unsigned short HOUR;
unsigned short MINUTE;
unsigned short SECOND;
char RUN_CODE[32];
};
#pragma pack

A few notes:
- unsigned long has no matching type in Java. If you know the value will not be larger than Long.MAX_VALUE you can still use long, otherwise BigInteger is a better choice.
- a C char is one Java byte, so to read 32 C chars you simply need to read 32 Java bytes and store these. The index of the first NULL character ('\0' or simply 0) can then be used to construct a String:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Nina Milo
Greenhorn

Joined: Jul 29, 2008
Posts: 19
Thanks for all your responses. Rob, i considered your reply and was able to get the 32bytes of char data .
My code snipped used:


Now i have one more question regarding the performance of read/write functions for a File. In reality whats the performance of reading and writing ASCII/ Binary files.
I felt handling binary files are little slower because of reading and reversing the bytes (endian format) will take more time compared to ASCII file. Please clarify this?

Thanks.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

Nina Milo wrote:
I felt handling binary files are little slower because of reading and reversing the bytes (endian format) will take more time compared to ASCII file. Please clarify this?


Does it really matter? If the file is in the wrong format, and needs extra processing, it's not like you have the option to not do the processing?

BTW, from looking at your code that reads in the characters, and loading it into a string, I don't see any endian processing. It looks like the endian-ness of the file, and of the machine (JVM) are the same.

Henry
Nina Milo
Greenhorn

Joined: Jul 29, 2008
Posts: 19
Thanks Henry. I appreciate your answer the code snippet i have wriiten has nothing do with endian processing. I am working on something else which does requires the conversion from big endian to little endian and vice versa.
I wanted to know the I/O performance in general. Which is faster in reading/writing among Ascii and binary files?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

Nina Milo wrote:Thanks Henry. I appreciate your answer the code snippet i have wriiten has nothing do with endian processing. I am working on something else which does requires the conversion from big endian to little endian and vice versa.
I wanted to know the I/O performance in general. Which is faster in reading/writing among Ascii and binary files?


In general, I/O is slower than the CPU. This is a weird thing to say because these are two different things, and arguably, not even comparable. What is meant by this, is that when I/O is being done, the CPU is waiting for the data -- and the waiting is an order of magnitude slower... example, in the time to wait for the bytes, the CPU could have spent hundreds, perhaps thousands of cycles, on each byte (if it didn't have to wait).

So... yes, doing a bit order flip to each byte does take more time (one or two cycles via a lookup table). However, you'll probably not even notice it, if you take into the account the time to fetch the bytes from the file too.

Henry
 
GeeCON Prague 2014
 
subject: Help required to read binary file