This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Data File Format & Schema File Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Data File Format & Schema File" Watch "Data File Format & Schema File" New topic
Author

Data File Format & Schema File

Pete Palmer
Ranch Hand

Joined: Oct 21, 2008
Posts: 74
Just started on my assignment ( Bodgitt & Scraper ) and wanted to make sure I have understood the relationship between the �Data file Format� and �Database Schema� Below is the contents:-

Data file Format
================
The format of data in the database file is as follows:
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record

Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block

Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information
End of file
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

Database schema
===============
The database that Bodgitt and Scarper uses contains the following fields:

Field name field name length description
----------------------------------------------------------------
Subcontractor name 32 name of subcontractor.
City location 64 locality
Types of work specialties 64 list of work type
Staff Number size 6 workers available at booking
Hourly charge rate 8 Charge per hour
Customer Id owner 8 Customer id

PS I have summarise the schema.

Given above, my understanding is that the schema defines the contents of each record and each record will contain the fields � size in bytes in brackets � name(32),location(64),specialties(64),size(6),rate(8) and owner(8). Therefore, each record will occupy 182 bytes.

My view of the database file contents is shown below. The first column is address offset, in decimal, relative to the beginning of the file.

0000 4 bytes � magic cookie
0004 4 bytes � offset to start of record zero
0008 2 bytes � number of fields in each record. This should be
six ( name, location,specialties, size, rate, owner )

0010 2 bytes � name field length. This will be set to 32.
( This is start of record zero )
0012 32 bytes � number of bytes for name field

0044 2 bytes � location field length. This will be set to 64.
0046 64 bytes � number of bytes for location field

0110 2 bytes � specialties field length. This will be set to 64.
0112 64 bytes � number of bytes for specialties field

0176 2 bytes � size field length. This will be set to 6.
0178 6 bytes � number of bytes for size field

0184 2 bytes � rate field length. This will be set to 8.
0186 8 bytes � number of bytes for rate field

0194 2 bytes � owner field length. This will be set to 8.
0196 8 bytes � number of bytes for owner field

0204 2 bytes � name field length. This will be set to 32.
( This is start of record ONE )
0206 32 bytes � number of bytes for name field

0238 2 bytes � location field length. This will be set to 64.
0240 64 bytes � number of bytes for location field

0304 2 bytes � specialties field length. This will be set to 64.
0306 64 bytes � number of bytes for specialties field

0370 2 bytes � size field length. This will be set to 6.
0372 6 bytes � number of bytes for size field

0378 2 bytes � rate field length. This will be set to 8.
0380 8 bytes � number of bytes for rate field

0388 2 bytes � owner field length. This will be set to 8.
0390 8 bytes � number of bytes for owner field

398 2 bytes � name field length. This will be set to 32.
( This is start of record TWO )

1) Please confirm or correct my understanding above.

2) Also, can you explain the following snippet from the Data File format :-

�Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record�

3) Finally, the 4 byte cookie value, is that fixed in the data file and what is its purpose ?

Thank you for your help.

Pete
Pete Palmer
Ranch Hand

Joined: Oct 21, 2008
Posts: 74
Noticed the schema file content is not very readable so I have updated it with comma separated fields

Database schema
===============
The database that Bodgitt and Scarper uses contains the following fields:

Field name, field name, length, description
----------------------------------------------------------------
Subcontractor, name, 32, name of subcontractor.
City, location, 64, locality
Types of work, specialties, 64, list of work type
Staff Number, size, 6, workers available at booking
Hourly charge, rate, 8, Charge per hour
Customer Id, owner, 8, Customer id

Appologises for the inconvience.

Pete
Michael Grossenbacher
Greenhorn

Joined: Mar 28, 2008
Posts: 4
Hi Pete

1)
I think you misunderstood something. The "Start of file" section and the "Schema description section" is the header of the file and will not be repeated foreach record. After the header will be only the "naked" records without size-bytes.

2)
That means, that every record has a pre-record indicator if this record is deleted or is valid. Keep in mind, that these 2 bytes are also counted to the record length.

3)
I understood the magic cookie in that way, that it is a fixed value which i can check while reading the header. If the cookie is not what I expect, then the database is not corresponding to my databasemodel and therefore will throw an exception while start reading the database.

greetZ Mike
Pete Palmer
Ranch Hand

Joined: Oct 21, 2008
Posts: 74
Mike,

Thank you for the prompt response. From what you say, the database file will be like :-

Start File Content (header)
0000 4 bytes � magic cookie
0004 4 bytes � offset to start of record zero
0008 2 bytes � number of fields in each record. This should be
six ( name, location,specialties, size, rate, owner )

Schema description section (header)
0010 2 bytes � name field length. This will be set to 32.
0012 32 bytes � number of bytes for name field

0044 2 bytes � location field length. This will be set to 64.
0046 64 bytes � number of bytes for location field

0110 2 bytes � specialties field length. This will be set to 64.
0112 64 bytes � number of bytes for specialties field

0176 2 bytes � size field length. This will be set to 6.
0178 6 bytes � number of bytes for size field

0184 2 bytes � rate field length. This will be set to 8.
0186 8 bytes � number of bytes for rate field

0194 2 bytes � owner field length. This will be set to 8.
0196 8 bytes � number of bytes for owner field

Start of Records
0204 2 bytes � Flag to indicate valid or deleted record (record 0)
0206 32 bytes - Sub Contractor name field
0238 64 bytes - Location field
0302 64 bytes - type of work field
0366 6 bytes - staff number field
0374 8 bytes - rate field
0382 8 bytes - customer id field.

0390 2 bytes � Flag to indicate valid or deleted record (record 1)
0392 32 bytes - Sub Contractor name field
0424 64 bytes - Location field
0488 64 bytes - type of work field
0552 6 bytes - staff number field
0558 8 bytes - rate field
0566 8 bytes - customer id field.

Repeat for more records.

Is my understanding, correct of the format of the database file?

Apart from the magic cookie value, what use is the rest of the header information -- all of 192 bytes ?

I presume the magic cookie read from the database file initally to determine it's value before used in the code ?

Many thanks again.

Pete
Pete Palmer
Ranch Hand

Joined: Oct 21, 2008
Posts: 74
Hi

I just generated a hex dump of database file and I can relate to the description of the Start and Schema description files.

From the above hex dump, "4 bytes � offset to start of record zero" seems to be the offset from the start of the first byte of the Schema description. Is this correct ?

As mentioned, before apart from the magic cookie value, what use is the rest of the header information ?

And I presume the magic cookie read from the database file initally to determine it's value before used in the code ?

Many thanks again.

Pete
Tom Doyle
Greenhorn

Joined: Apr 23, 2002
Posts: 9
Hi Pete,

I interpreted "Magic Cookie" as a file signature, i.e. the "magic number" that is embedded at the beginning of many data files identifying the file type or originating application. This conclusion would seem to be consistent with the comment in the instructions denoting, "magic cookie value identifies this as a data file".

The remaining fields in the header (1) indicate where the first contractor data record begins and (2) tells us how many time the three fields in the schema section repeat.

Best of luck,
Tom

[ November 15, 2008: Message edited by: Tom Doyle ]
[ November 15, 2008: Message edited by: Tom Doyle ]

Best regards<br />Tom<br /> <br />SCJP4<br />SCJD6 (B&S in progress)
Pete Palmer
Ranch Hand

Joined: Oct 21, 2008
Posts: 74
Thank you very much Mike & Tom for the clarification.

Pete
Zonglin Li
Greenhorn

Joined: Aug 29, 2008
Posts: 12
Hi Pete:
I am working on the Bodgitt and Scarper too.

Your Schema description section is wrong. At the starting point (the schema), the first 2 bytes will tell you how many bytes you will expect for the field name (the length of the field�s name not the field�s length). You extract the field name and then the following 2 bytes will give you the length of this field. You repeat this for 6 times.

Schema description section:
0010 2 bytes � length in bytes of filed name (if the result is n)
0012 n bytes (number of bytes should be different for each field)
� filed name
0012+n 32 bytes (different field has different length, which has already given to you on the requirement. The first field is 32 for name field.)� name length
Pete Palmer
Ranch Hand

Joined: Oct 21, 2008
Posts: 74
Hi Zonglin,

You are correct, I had completely missed the point with respect to the schema description. I only realised this when I generated a hex dump of the datafile and then tried to confirm my understanding with the contents. Of course, this exercise highlighted my lack of understanding but after more perseverance, I finally ( think !) got grasp of it.

Thanks for point my error out.

Pete.
 
Consider Paul's rocket mass heater.
 
subject: Data File Format & Schema File
 
Similar Threads
NX: Bodgitt and Scarper: what to do with the db-2*3.db file? write a parser? and...
All numeric values are stored in the header information
About the db file question.
SCJD assignment changed!
Data file format from new assignment