This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes I/O Misunderstanding (beta data file) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "I/O Misunderstanding (beta data file)" Watch "I/O Misunderstanding (beta data file)" New topic
Author

I/O Misunderstanding (beta data file)

Ronnie Phelps
Ranch Hand

Joined: Mar 12, 2001
Posts: 329
I'm not ashamed to admit it but low level file I/O has allways been my biggest java phobia. I'm having a problem creating and reading my datafile.
I use a DataOutputStream to write to the dataFile:

for the 4 byte numeric values i use:
writeInt(numericValue)
for the 2 byte numeric value for number fields in each record:
writeShort(numericValue)
for the 2 byte numeric, length in bytes of field name I use:
writeByte(fieldLength)
for the field name value I use:
writeUTF(fieldName)
and for the field length I use:
writeByte(fieldLength)
For the data secion, I repeat using writeUTF(fieldName) for each field. In the case where the text doesn't fill the entire length of the field I append null characters,before writing to each field, so that the text will fill out the length of the field.
This seems to be okay but if you think it's not okay please feel free to comment. My header size comes out to be 70 bytes.

Now here is where my confusion begins. I use a RandomAccess File to read the previously written data.

I position the offset pointer to 70, which should be the offset of record zero.
next I try attempting to read only the first field of record zero which should be 32 length in bytes using ras.readUTF(). And I get unexpected results.
because the offset pointer was at 70 and I read 32 bytes. I assume that the offset pointer shoud now be at 102=(70 +32) but instead it's at 121
And when I try reading in the first two fields which are 32 and 64 bytes in length. The file pointer is at 232!

And when I print the results of readUTF I get exactly what I'm expecting.
What am I doing wrong? Am I using the wrong I/O classes?
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Hi Ronnie,
Have you looked into using scatters and gatherers from NIO? It makes this sort of thing very easy. If you like(or if others would), I can go over it here. However, I don't want to do if it's not the direction you're interested in.
All best,
M, author
The Sun Certified Java Developer Exam with J2SE 1.4


Java Regular Expressions
Ronnie Phelps
Ranch Hand

Joined: Mar 12, 2001
Posts: 329
Max I don't know what scatters and gathers are? But with only days to finish the assignment. I'm kind of desperate now and I'm willing to try anything.
Ronnie Phelps
Ranch Hand

Joined: Mar 12, 2001
Posts: 329
I took a quick look at the API for GatheringByteChannel and ScatteringByteChannel but they both appear to only write and read bytes and I need to write my data section using "8-bit US ASCII" encoding. Can I still use these?
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Yes, absolutely.
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Say you need to create a file that contains a String which can be 32 bytes(set field size), followed by a short that can only be 6 bytes(set field size), and you need it to be encoded in US-ASCII. How to start?
say your String is "Hello World", and your short is 99.

Now, assume you already have your FileChannel, fc, and you already have the position you want to write to: say at position 190(in bytes) in the file. You calculated 190 because you want to overwrite record number 5, and 5*(32+6) == 190. You probably want to work out a locking scheme before calling this method, but I don't want to get into the threading here. The actually method itself looks something like the following.

That's it.
Of course, you'll probably want to write a method that creates bytes for you, by taking in the String target and the size of the record as two paramaters. And you'll probably want to overload that method so it accepts shorts as a target as well, but that's all just code, and not hard to write. I just wanted to convey the basic idea here. I'll leave reading as an exercise for the reader .
HTH
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
[ October 06, 2002: Message edited by: Max Habibi ]
Ronnie Phelps
Ranch Hand

Joined: Mar 12, 2001
Posts: 329
Thanks Max,
I did research using your book and I have a better understanding of java's NIO classes. And you are right. Using scatters and gathers makes things alot easier. It looks like smooth sailing from here.

But I have one more question. As I said before I don't have much knownledge of I/O and charset encoding so this might be kind of obvious. Before reading your last post I used getBytes() instead of getBytes("US-ASCII") and things seemed okay. I'm assuming that getBytes() worked because "US-ASCII" is the default for my machine. But I probably should use getBytes("US-ASCII") to make the application work the same regardless of what the default charset of the machine is. Is this true?
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Yes, that's absolutely true.
Good luck,
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
Junilu Lacar
Bartender

Joined: Feb 26, 2001
Posts: 4445
    
    5

I have had similar concerns (see http://www.coderanch.com/t/182088/java-developer-SCJD/certification/Charset-bit-US-ASCII-BETA ). First, regarding the required character encoding of "8 bit US-ASCII". In the JavaDocs for charset, "US-ASCII" is defined as 7 (not 8) bit US-ASCII.
UTF-8 is 8-bit but it prepends two bytes representing the string length. This is OK for the schema information in the header (more on this below) but not OK for the required format in the data section. The requirements specify a fixed record length, thus there is no need to prepend the field length in the record data itself. It seems to me that writeBytes() is the appropriate method to use when writing an actual record.
Coming back to the character encoding, the only other standard encoding that is 8 bit is ISO-8859-1. I know it's not US-ASCII but it's 8-bit. Again, "US-ASCII" is 7-bit US-ASCII, according to the docs Which one should we use?
Ronnie,
I don't think that you need to do writeShort() for the field name length if you use writeUTF() for the field name. writeUTF() prepends the length of the string being written out, which is just what you need to comply with the specs for the header. If you use a hex editor to view the data file, you'll see that after writeShort(fieldNameLength) and writeUTF(fieldName), the field name length value will actually appear twice before the field name. Use either writeShort() + writeBytes() or just writeUTF().
Also, using writeByte() for the field length is wrong (assuming we have the same requirements, which specifies a 2-byte numeric for the field length). You should use writeShort(), just as you did for the other 2-byte numerics.
If you do this, your header should come out with a length of 70.


Junilu - [How to Ask Questions] [How to Answer Questions]
Junilu Lacar
Bartender

Joined: Feb 26, 2001
Posts: 4445
    
    5

Originally posted by Ronnie Phelps:
I'm assuming that getBytes() worked because "US-ASCII" is the default for my machine. But I probably should use getBytes("US-ASCII") to make the application work the same regardless of what the default charset of the machine is. Is this true?

IMO, yes, you should explicitly specify the charset. On my Windows NT4 Workstation, the default charset is "CP1252". I used the following to see the default charset:
System.out.println(new java.io.InputStreamReader(System.in).getEncoding());
[ October 06, 2002: Message edited by: Junilu Lacar ]
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
<snip>Coming back to the character encoding, the only other standard encoding that is 8 bit is ISO-8859-1. I know it's not US-ASCII but it's 8-bit. Again, "US-ASCII" is 7-bit US-ASCII, according to the docs Which one should we use?
<snip>
I caught that also, but I read as typo, given the other, various things that were wrong with test. I think they meant to say 7 bit.

ps- on a sidenote, I'm suprised that no one has brought up using the FileLock object for file locks. It's not appropriate(IMO), but I'm suprised that no one has mentioned it.
All best,
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: I/O Misunderstanding (beta data file)
 
Similar Threads
NX : Help! Error occur in read the data file!
Data File Format & Schema File
How to check the B&S db file is in correct format?
Reading the Datafile
URLBirdy Collections Advice