aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Reading the Datafile Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Reading the Datafile" Watch "Reading the Datafile" New topic
Author

Reading the Datafile

Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Alright, I am very bad with this topic. I managed to read the datafile, however, it seem that I can't get the position right. I have read the instruction many times but still very confused. In case I miss out anything, can someone kindly explain to me the meaning of:

Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record

Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block

Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information

End of file

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.


SCJP, SCWCD, SCBCD, SCJD (In progress)
Jeroen T Wenting
Ranch Hand

Joined: Apr 21, 2006
Posts: 1847
The file has a header.
That header consists of a fixed part and a flexible part.
The fixed part are the few fields in the first section, 10 bytes in total.
The second part is flexible in that it's repeated for each field in the database.
The last 2 bytes of the first section tell you how many times that part is repeated.
Each field has a descriptor in that second part which is 4 bytes + a number of bytes as defined by the first 2 bytes of the descriptor.

After that follow the individual records, which each have a length equal to the total of all the field sizes as mentioned in the field descriptors combined, plus 2 bytes to indicate whether the record was deleted.

The first record is positioned at an offset from the start of the file which is indicated by the 2nd 4 byte block in the file header.


42
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Alright... need a little more help here. I still couldn't get the position right.



I believe it got something to do with locationInFile and input. But I just couldn�t understand the right way to do it or how should I understand the data file format and apply it here.

[Andrew: put code between [code] and [/code] UBB tags]
[ July 13, 2006: Message edited by: Andrew Monkhouse ]
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11423
    
  85

I agree that it is probably something to do with the locationInFile - how are you calculating that? You have not shown that particular code.

Regards, Andrew


The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Hi Andrew, thanks for the reply!

Yeah precisely that's where I am stuck at. I have no idea how do I implement that logic here. I need someone to enlighten me here... like giving me some ideas to start me off.

What does the magic cookie value means? And the 4 byte numeric offset to start of record zero?
[ July 14, 2006: Message edited by: Jeffery Lim ]
Jeroen T Wenting
Ranch Hand

Joined: Apr 21, 2006
Posts: 1847
the cookie is just a marker to identify the exact filetype.
Many applications use such things to determine what type a specific file is (including operating systems sometimes).

The offset is the byte location of the first record in the file, handy for you to use when reading the file.
It's also a handy tool for use in checking whether the file headers are corrupt. If the first record doesn't start there, something is wrong and you can give an error.
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Hmm... so basically I just use the 4 bytes for my locationInFile variable to get the first record? Thereafter is just repeating the whole process?

Whereas for the rest of the fields, I can't think of much use for them, am I right?
[ July 15, 2006: Message edited by: Jeffery Lim ]
Jeroen T Wenting
Ranch Hand

Joined: Apr 21, 2006
Posts: 1847
The rest of the header will tell you exactly what the actual data in the fields is. How many bytes for each field for example, in what order, and what the field is named.
You can use that for more validation and to determine the actual record size.
You might even use the field names as labels for your user interface elements if you wanted to.
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11423
    
  85

You might also want to consider that there is nothing in your Data class definition that is specific to Hotels (or to any data structure for that matter). You know that for your particular use case you will be using it for hotels, however the Data class you write could be used for any form of data - client records, billing records, anything you care to put in them. But in order to make the nice generic Data class that can handle any form of record, you would have to read the schema to determine number and names of fields.

Regards, Andrew
Shan Jun Hao
Ranch Hand

Joined: May 23, 2006
Posts: 39
Thanks Jeroen and Andrew. Well, as I mentioned, I am really very very bad with this. Nevertheless, I gonna give it a try.
 
 
subject: Reading the Datafile