wood burning stoves 2.0*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Reading the Datafile Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Reading the Datafile" Watch "Reading the Datafile" New topic
Author

Reading the Datafile

Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Alright, I am very bad with this topic. I managed to read the datafile, however, it seem that I can't get the position right. I have read the instruction many times but still very confused. In case I miss out anything, can someone kindly explain to me the meaning of:

Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record

Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block

Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information

End of file

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.


SCJP, SCWCD, SCBCD, SCJD (In progress)
Jeroen T Wenting
Ranch Hand

Joined: Apr 21, 2006
Posts: 1847
The file has a header.
That header consists of a fixed part and a flexible part.
The fixed part are the few fields in the first section, 10 bytes in total.
The second part is flexible in that it's repeated for each field in the database.
The last 2 bytes of the first section tell you how many times that part is repeated.
Each field has a descriptor in that second part which is 4 bytes + a number of bytes as defined by the first 2 bytes of the descriptor.

After that follow the individual records, which each have a length equal to the total of all the field sizes as mentioned in the field descriptors combined, plus 2 bytes to indicate whether the record was deleted.

The first record is positioned at an offset from the start of the file which is indicated by the 2nd 4 byte block in the file header.


42
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Alright... need a little more help here. I still couldn't get the position right.



I believe it got something to do with locationInFile and input. But I just couldn�t understand the right way to do it or how should I understand the data file format and apply it here.

[Andrew: put code between [code] and [/code] UBB tags]
[ July 13, 2006: Message edited by: Andrew Monkhouse ]
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11404
    
  81

I agree that it is probably something to do with the locationInFile - how are you calculating that? You have not shown that particular code.

Regards, Andrew


The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Hi Andrew, thanks for the reply!

Yeah precisely that's where I am stuck at. I have no idea how do I implement that logic here. I need someone to enlighten me here... like giving me some ideas to start me off.

What does the magic cookie value means? And the 4 byte numeric offset to start of record zero?
[ July 14, 2006: Message edited by: Jeffery Lim ]
Jeroen T Wenting
Ranch Hand

Joined: Apr 21, 2006
Posts: 1847
the cookie is just a marker to identify the exact filetype.
Many applications use such things to determine what type a specific file is (including operating systems sometimes).

The offset is the byte location of the first record in the file, handy for you to use when reading the file.
It's also a handy tool for use in checking whether the file headers are corrupt. If the first record doesn't start there, something is wrong and you can give an error.
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Hmm... so basically I just use the 4 bytes for my locationInFile variable to get the first record? Thereafter is just repeating the whole process?

Whereas for the rest of the fields, I can't think of much use for them, am I right?
[ July 15, 2006: Message edited by: Jeffery Lim ]
Jeroen T Wenting
Ranch Hand

Joined: Apr 21, 2006
Posts: 1847
The rest of the header will tell you exactly what the actual data in the fields is. How many bytes for each field for example, in what order, and what the field is named.
You can use that for more validation and to determine the actual record size.
You might even use the field names as labels for your user interface elements if you wanted to.
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11404
    
  81

You might also want to consider that there is nothing in your Data class definition that is specific to Hotels (or to any data structure for that matter). You know that for your particular use case you will be using it for hotels, however the Data class you write could be used for any form of data - client records, billing records, anything you care to put in them. But in order to make the nice generic Data class that can handle any form of record, you would have to read the schema to determine number and names of fields.

Regards, Andrew
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Thanks Jeroen and Andrew. Well, as I mentioned, I am really very very bad with this. Nevertheless, I gonna give it a try.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading the Datafile
 
Similar Threads
NX: Bodgitt and Scarper - data file access caveats???
Unable to understand the data file format for URLyBird 1.3.2
Data File Format & Schema File
how to read data from db file
.db file format problem, help please!!!