Hello, I am confused by the way I have to read the Dtafile. Here is my text
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.
Right now I can read everything this way but I don't use any encoding For the header I read it this way:
the for the records I use this
But nowhere I used this 8 bit US-ASCII I don't know which character encoding it is All I found is
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1 UTF-8 Eight-bit UCS Transformation Format UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
So Here are my questions: 1) Which one is it? is it US-ASCII ( 7 bits) or ISO-8859-1 or UTF-8??? 2) whwn I read the schema which is a mix of int and char for the field name and field length: if I use it should I read also the integer as char and then convert them? 3) Then I cannot use BufferedReader and I have to read char one by one 4) They also say DataInputSteam for the header and in the DataInputStream java definition they say :
A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.
which mean that it is machine dependent.... I am alll confused because th eheader and schema and th erecords are all written in the same fil ewith the same encoding I suppose....
Can you help me understand this point Thank you - Lydie
Bonsoir Lydie, Welcome to JavaRanch and this forum!
1) Which one is it? is it US-ASCII ( 7 bits) or ISO-8859-1 or UTF-8???
There is no real difference between US-ASCII ( 7 bits) and the "8 bit US-ASCII" stated in the instructions. Think of the fact that a byte (the smallest "normally" manageable memory unit) uses 8 bits, hence 7 bits use a full byte anyway. In English, one bit is useless (and unused), just because English writers don't know the joy of using the weird �, �, �, �, �, �, �, �, �, ... and other funny characters we use in French and other european languages and which are all coded on the 8th bit. So, at the binary level, US-ASCII and ISO-8859-1 are just compatible.
2) whwn I read the schema which is a mix of int and char for the field name and field length: if I use it should I read also the integer as char and then convert them?
The provided file is a binary one. So, you should read expected primitives as such (readShort(), readInt(), ...), and text data as bytes (byte). Without using NIO (and its provided Charset class) as NIO is now forbidden in the latest versions of the instructions, you can convert a byte to a String using the special String constructor which accept a charset name as second parameter. And to convert a String to a byte, String.getBytes(String charsetName) looks perfect either.
3) Then I cannot use BufferedReader and I have to read char one by one
I'd avoid the use of any Reader (aimed to read *text*) with a binary file, and anyway you don't need it (see 2)).
A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. (...) which mean that it is machine dependent....
What do you mean exactly? What I can tell you for sure (it's a question often asked about that part of the instructions) is that DataInputStream and DataOutputStream are format-compatible with RandomAccessFile that you'll probably prefer to both of them (you'll have to read from the file, but also write to it, so RAF looks handier). Regards, Phil. [ April 30, 2004: Message edited by: Philippe Maquet ]
ReBonjour Philippe! What you are telling me is that I can use "raf" even if in if in the intructions it is told that the format used is DataInputStream (characte encoding 8 bit US ASCII). Is that right? Sorry I have a hard time with IO....
What you are telling me is that I can use "raf" even if in if in the intructions it is told that the format used is DataInputStream
That's right. I suspect that this is just Sun's way of telling you that you do not have to worry about whether the datafile was created on a little endian machine while you are working on a big endian machine or vice versa. You know that the data file can be read / written with the standard Java classes. But you are free to use any other Java class that suits your needs. Regards, Andrew