• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Paul Clapham
  • paul wheaton
Sheriffs:
  • Tim Cooke
  • Devaka Cooray
  • Rob Spoor
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:

Data File Format & Schema File

 
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just started on my assignment ( Bodgitt & Scraper ) and wanted to make sure I have understood the relationship between the �Data file Format� and �Database Schema� Below is the contents:-

Data file Format
================
The format of data in the database file is as follows:
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record

Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block

Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information
End of file
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

Database schema
===============
The database that Bodgitt and Scarper uses contains the following fields:

Field name field name length description
----------------------------------------------------------------
Subcontractor name 32 name of subcontractor.
City location 64 locality
Types of work specialties 64 list of work type
Staff Number size 6 workers available at booking
Hourly charge rate 8 Charge per hour
Customer Id owner 8 Customer id

PS I have summarise the schema.

Given above, my understanding is that the schema defines the contents of each record and each record will contain the fields � size in bytes in brackets � name(32),location(64),specialties(64),size(6),rate(8) and owner(8). Therefore, each record will occupy 182 bytes.

My view of the database file contents is shown below. The first column is address offset, in decimal, relative to the beginning of the file.

0000 4 bytes � magic cookie
0004 4 bytes � offset to start of record zero
0008 2 bytes � number of fields in each record. This should be
six ( name, location,specialties, size, rate, owner )

0010 2 bytes � name field length. This will be set to 32.
( This is start of record zero )
0012 32 bytes � number of bytes for name field

0044 2 bytes � location field length. This will be set to 64.
0046 64 bytes � number of bytes for location field

0110 2 bytes � specialties field length. This will be set to 64.
0112 64 bytes � number of bytes for specialties field

0176 2 bytes � size field length. This will be set to 6.
0178 6 bytes � number of bytes for size field

0184 2 bytes � rate field length. This will be set to 8.
0186 8 bytes � number of bytes for rate field

0194 2 bytes � owner field length. This will be set to 8.
0196 8 bytes � number of bytes for owner field

0204 2 bytes � name field length. This will be set to 32.
( This is start of record ONE )
0206 32 bytes � number of bytes for name field

0238 2 bytes � location field length. This will be set to 64.
0240 64 bytes � number of bytes for location field

0304 2 bytes � specialties field length. This will be set to 64.
0306 64 bytes � number of bytes for specialties field

0370 2 bytes � size field length. This will be set to 6.
0372 6 bytes � number of bytes for size field

0378 2 bytes � rate field length. This will be set to 8.
0380 8 bytes � number of bytes for rate field

0388 2 bytes � owner field length. This will be set to 8.
0390 8 bytes � number of bytes for owner field

398 2 bytes � name field length. This will be set to 32.
( This is start of record TWO )

1) Please confirm or correct my understanding above.

2) Also, can you explain the following snippet from the Data File format :-

�Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record�

3) Finally, the 4 byte cookie value, is that fixed in the data file and what is its purpose ?

Thank you for your help.

Pete
 
Pete Palmer
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Noticed the schema file content is not very readable so I have updated it with comma separated fields

Database schema
===============
The database that Bodgitt and Scarper uses contains the following fields:

Field name, field name, length, description
----------------------------------------------------------------
Subcontractor, name, 32, name of subcontractor.
City, location, 64, locality
Types of work, specialties, 64, list of work type
Staff Number, size, 6, workers available at booking
Hourly charge, rate, 8, Charge per hour
Customer Id, owner, 8, Customer id

Appologises for the inconvience.

Pete
 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Pete

1)
I think you misunderstood something. The "Start of file" section and the "Schema description section" is the header of the file and will not be repeated foreach record. After the header will be only the "naked" records without size-bytes.

2)
That means, that every record has a pre-record indicator if this record is deleted or is valid. Keep in mind, that these 2 bytes are also counted to the record length.

3)
I understood the magic cookie in that way, that it is a fixed value which i can check while reading the header. If the cookie is not what I expect, then the database is not corresponding to my databasemodel and therefore will throw an exception while start reading the database.

greetZ Mike
 
Pete Palmer
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Mike,

Thank you for the prompt response. From what you say, the database file will be like :-

Start File Content (header)
0000 4 bytes � magic cookie
0004 4 bytes � offset to start of record zero
0008 2 bytes � number of fields in each record. This should be
six ( name, location,specialties, size, rate, owner )

Schema description section (header)
0010 2 bytes � name field length. This will be set to 32.
0012 32 bytes � number of bytes for name field

0044 2 bytes � location field length. This will be set to 64.
0046 64 bytes � number of bytes for location field

0110 2 bytes � specialties field length. This will be set to 64.
0112 64 bytes � number of bytes for specialties field

0176 2 bytes � size field length. This will be set to 6.
0178 6 bytes � number of bytes for size field

0184 2 bytes � rate field length. This will be set to 8.
0186 8 bytes � number of bytes for rate field

0194 2 bytes � owner field length. This will be set to 8.
0196 8 bytes � number of bytes for owner field

Start of Records
0204 2 bytes � Flag to indicate valid or deleted record (record 0)
0206 32 bytes - Sub Contractor name field
0238 64 bytes - Location field
0302 64 bytes - type of work field
0366 6 bytes - staff number field
0374 8 bytes - rate field
0382 8 bytes - customer id field.

0390 2 bytes � Flag to indicate valid or deleted record (record 1)
0392 32 bytes - Sub Contractor name field
0424 64 bytes - Location field
0488 64 bytes - type of work field
0552 6 bytes - staff number field
0558 8 bytes - rate field
0566 8 bytes - customer id field.

Repeat for more records.

Is my understanding, correct of the format of the database file?

Apart from the magic cookie value, what use is the rest of the header information -- all of 192 bytes ?

I presume the magic cookie read from the database file initally to determine it's value before used in the code ?

Many thanks again.

Pete
 
Pete Palmer
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I just generated a hex dump of database file and I can relate to the description of the Start and Schema description files.

From the above hex dump, "4 bytes � offset to start of record zero" seems to be the offset from the start of the first byte of the Schema description. Is this correct ?

As mentioned, before apart from the magic cookie value, what use is the rest of the header information ?

And I presume the magic cookie read from the database file initally to determine it's value before used in the code ?

Many thanks again.

Pete
 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Pete,

I interpreted "Magic Cookie" as a file signature, i.e. the "magic number" that is embedded at the beginning of many data files identifying the file type or originating application. This conclusion would seem to be consistent with the comment in the instructions denoting, "magic cookie value identifies this as a data file".

The remaining fields in the header (1) indicate where the first contractor data record begins and (2) tells us how many time the three fields in the schema section repeat.

Best of luck,
Tom

[ November 15, 2008: Message edited by: Tom Doyle ]
[ November 15, 2008: Message edited by: Tom Doyle ]
 
Pete Palmer
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you very much Mike & Tom for the clarification.

Pete
 
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Pete:
I am working on the Bodgitt and Scarper too.

Your Schema description section is wrong. At the starting point (the schema), the first 2 bytes will tell you how many bytes you will expect for the field name (the length of the field�s name not the field�s length). You extract the field name and then the following 2 bytes will give you the length of this field. You repeat this for 6 times.

Schema description section:
0010 2 bytes � length in bytes of filed name (if the result is n)
0012 n bytes (number of bytes should be different for each field)
� filed name
0012+n 32 bytes (different field has different length, which has already given to you on the requirement. The first field is 32 for name field.)� name length
 
Pete Palmer
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Zonglin,

You are correct, I had completely missed the point with respect to the schema description. I only realised this when I generated a hex dump of the datafile and then tried to confirm my understanding with the contents. Of course, this exercise highlighted my lack of understanding but after more perseverance, I finally ( think !) got grasp of it.

Thanks for point my error out.

Pete.
 
The City calls upon her steadfast protectors. Now for a tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic