Win a copy of Head First Android this week in the Android forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

NX: Bodgitt and Scarper - data file access caveats???

 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Gurus,
I've read most of the posts here relating to reading and writing bytes to/from the data file. This is what I've come up with and I want to make sure that I'm not doing anything blatantly idiotic. First, I'll post the data file format and then my assumptions.
**** Data File Format Start ****
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record
Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block
Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information
End of file
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.
**** Data File Format End ****
- for the numeric values, I should be using RandomAccessFile#readInt and #readShort
- the valid record flag should equal a string of "\u0000\u0000" and the delete field flag should equal a string of "\u8000"
- I should be using RandomAccessFile#readFully instead of #read when loading my byte[] objects
- When I convert the bytes I read into a String, I should do a new String(bytes,"US-ASCII") and a strObj.getBytes("US-ASCII") on writes
- "US-ASCII" is really 7 bit and I need 8 bit. Am I missing something here or do I need another encoding?
- I'm not sure of the best way to handle my delete flag writes, RandomAccessFile#writeChars("\u8000")???
- Even though it's been highly debated, I think I'll keep from trimming the spaces following many of the values in the data file, when I read them into memory.
- When reading in the field values, I'll have to loop through the chars and find the first null, everything before that will be my field value.

Thanks a lot gang.
-Tim
 
author and jackaroo
Posts: 12199
280
Mac IntelliJ IDE Firefox Browser Oracle C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Tim,
Nice summation of so many discussions.

"US-ASCII" is really 7 bit and I need 8 bit. Am I missing something here or do I need another encoding?


Welcome to the wonderful world of clueless user specifications. :roll:
You have to make a design decision. Is it likely the user wants US-ASCII or an 8-bit format?
By the way: UTF-8 "uses all bits of an octet, but has the quality of preserving the full US-ASCII range: US-ASCII characters are encoded in one octet having the normal US-ASCII value, and any octet with such a value can only stand for an US-ASCII character, and nothing else." (from the UTF-8 RFC).

I'm not sure of the best way to handle my delete flag writes


Have you considered converting 0x8000 into the equivalant short value, and reading and writing it that way?

When reading in the field values, I'll have to loop through the chars and find the first null, everything before that will be my field value.


Presumably stopping if you reach the end of a field length without finding a null.
***
This all sounds pretty good. It sounds like you have made a few design decisions to get to what you have written. Have you documented them?
Regards, Andrew
 
Ranch Hand
Posts: 555
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Timothy,
I join Andrew's statements, expecially that one:

Have you considered converting 0x8000 into the equivalant short value, and reading and writing it that way?



Best,
Vlad
 
Timothy Johnson
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Gentlemen,
Thanks for your input. Yeah, my choices.txt is growing by leaps and bounds but I'm learning a lot in the process.

I'm still a little stumped on the encoding though... I saw one individual claim to have gotten a 91% using the default encoding, Philippe M. is a proponent of "US-ASCII", and Andrew seems to imply that "UTF-8" is the way to fly. Hmmmm....

As far as the delete flag is concerned... I could use RandomAccessFile.writeShort(Character.getNumericValue('\u8000')) but what's the real advantage over using RandomAccessFile.writeChars(new String("\u8000","UTF-8")) or even RandomAccessFile.writeChar(Character.getNumericValue('\u8000'))?

Thanks fellas,
Tim
 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic