wood burning stoves*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Data File Format and reading header information Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Data File Format and reading header information" Watch "Data File Format and reading header information" New topic
Author

Data File Format and reading header information

Dmitri Christo
Ranch Hand

Joined: Jan 19, 2007
Posts: 81
Hello,

I am just starting with urlybird 1.2.1 and already have some questions! :-) So, far I have answered quite a few of them searching this forum. Great help!

I have doubts regarding the file format, its contents and how to manipulate them. Obviously all development and testing will be done on a copy of that file.

The format given is shown below: ( I hope posting this part of the instructions does not violate JavaRanch rules - If removed I will rephrase my question )
Data file Format
The format of data in the database file is as follows:

Start of file
4 byte numeric, magic cookie value. Identifies this as a data file
4 byte numeric, total overall length in bytes of each record
2 byte numeric, number of fields in each record

Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block

Data section.
Repeat to end of file:
1 byte "deleted" flag. 0 implies valid record, 1 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information

End of file

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.


I can see the structure is clearly defined, but have some doubts about the byte numeric values. Initially I used a hex editor to open the .db file, but found only symbols or blank spaces for the numeric values - text is showing fine. Is a hex editor perhaps not the best way to look into the header information of a file?

I will use RandomAccessFile for writing on the file when the time comes, but to place the seek pointer to the right location (beginning of the record) I would need the correct offset (The record length is given in the instructions). That offset I will calculate from the file format information. The big issue is to determine those byte numeric values. Are they left blank on purpose (for me to define) or am I trying to find them the wrong way? How would you suggest I retrieve the header information?

I appreciate any advice on this.

Thanks!
Pablo Manrubia
Greenhorn

Joined: Dec 29, 2007
Posts: 6
Hi Dmitri!,

Try using DataInputStream this way...
//...
InputStream is = FileDBMain.class.getResourceAsStream(FILE_NAME);
DataInputStream dis = new DataInputStream(new BufferedInputStream(is));
//...
int magicNumber = dis.readInt();
int nRecords = dis.readUnsignedShort();
//...
for (int i = 0; i < nRecords; i++) {
int nLength = dis.readUnsignedByte();
dis.read(buffer,0, nLength);
String fieldName = new String(buffer,0,nLength,ENCODING);
int fLength = dis.readUnsignedByte();
fields[i] = fLength;
}

This way I can parse the file correctly.
Dmitri Christo
Ranch Hand

Joined: Jan 19, 2007
Posts: 81
Hi Pablo, thanks for your help, I'll try your advice.

Let me share what I am seeing in the file through the hex editor: (I hope it will appear in the column format properly). The following should represent the Start of file and the Schema description section of the file.


00000000:00 00 01 01 00 00 00 9f 00 07 00 04 6e 61 6d 65 .......�....name
00000010:00 40 00 08 6c 6f 63 61 74 69 6f 6e 00 40 00 04 .@..location.@..
00000020:73 69 7a 65 00 04 00 07 73 6d 6f 6b 69 6e 67 00 size....smoking.
00000030:01 00 04 72 61 74 65 00 08 00 04 64 61 74 65 00 ...rate....date.
00000040:0a 00 05 6f 77 6e 65 72 00 08 ...owner..


By the way, I saw the same thing using:


As you can see all the numeric values show up as dots (.) and symbols - Is it wrong to believe they are left blank/empty so I can choose/select the values myself?

So, calculating the bytes from the instructions given in the first post:
Start of file is 4+4+2 = 10 bytes long
This includes: magic cookie + length of each record + number of fields per recods
Schema description section is 2 + n + 2 bytes for each field. I assume one byte per text character and based upon the hex view it should be for the Schema description section =
2+4+2 = 8 bytes long (for 'name' field)
2+8+2 = 12 bytes long (for 'location' field)
2+4+2 = 8 bytes long (for 'size' field)
2+7+2 = 11 bytes long (for 'smoking' field)
2+4+2 = 8 bytes long (for 'rate' field)
2+4+2 = 8 bytes long (for 'date' field)
2+5+2 = 9 bytes long (for 'owner' field)
Summing all up would be: 64 bytes for the Schema description section

I still have doubts why the numeric values aren't showing and if my assumptions calculating the length of the various sections of the file are correct.

So, basically you think displaying the contents as you describe above will show the byte numeric values properly?

Thanks again for any advice.
Pablo Manrubia
Greenhorn

Joined: Dec 29, 2007
Posts: 6
Yes, but you'd have to use dis.readUnsignedShort because it's a two bytes number.

I'm parsing my file with this code (my schema is a bit different)

InputStream is = FileDBMain.class.getResourceAsStream(FILE_NAME);
DataInputStream dis = new DataInputStream(new BufferedInputStream(is));
assert (is != null) : "File not found?" + FILE_NAME;

try {

int magicNumber = dis.readInt();
int nFields = dis.readUnsignedShort();
int[] fSizes = new int[nFields];

// schema
for (int i = 0; i < nFields; i++) {
int fieldNameLength = dis.readUnsignedByte();
dis.read(buffer, 0, fieldNameLength);
fSizes[i] = dis.readUnsignedByte();
}

// data
int counter = 0;
while (true) {
int flag = dis.readUnsignedByte();
if (flag == 0x00) {
list.add(counter);
}
for (int j = 0; j < fSizes.length; j++) {
dis.read(buffer, 0, fSizes[j]);
String fieldName = new String(buffer, 0, fSizes[j],ENCODING);
System.out.println(j + "-" + fieldName);
}
counter++;
}

} catch (EOFException e) {
// todo comentar
System.out.println("Fin de la lectura");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

I think you can't see the numbers correctly with your hex editor because each byte is interpreted as a character; they'd have to be read as two byte numbers by your application

I hope it helps!
Pablo Manrubia
Greenhorn

Joined: Dec 29, 2007
Posts: 6
Hi Dmitri!,

Don't forget to change DataInputStream to RandomAccessFile.
The code above is only for debuggig
Dmitri Christo
Ranch Hand

Joined: Jan 19, 2007
Posts: 81
Thanks Pablo, this is getting much more clearer to me now. I will use DataInputStream and hopefully it will be fine.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Data File Format and reading header information