File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes B&S: Data File Format Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "B&S: Data File Format" Watch "B&S: Data File Format" New topic
Author

B&S: Data File Format

Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Hi All, need help to understand the data file and its format. I have never worked with files in real life and this is the only part which is not fitting into my understanding.

FORMAT DESCRIPTION IN INSTRUCTIONS
(I have inserted dashed lines with section names for ease of asking questions):
--------------------SECTION 1--------------------------------
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record
-------------------SECTION 1 ENDS--------------------------------

-----------------SECTION 2--------------------------------------
Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block
----------------SECTION 2 ENDS------------------------------------

----------------SECTION 3-----------------------------------------
Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information
----------------SECTION 3 ENDS----------------------------------------

---------------
ACTUAL DATA FILE SUPPLIED WITH MY ASSIGNMENT(HEADER AND SOME OF DATA)AS VIEWED IN WORDPAD.(BUT IT SHOWS DIFFERENT IN HEX EDITOR AND JEDIT). I hope my questions can be answered with this view.

Fname location@
specialties@sizerateownerDogs With Tools Smallville Roofing 7 $35.00 Hamner & Tong Smallville Drywall, Roofing 10 $85.00

--------------------------

QUESTIONS:
1. Can you please seprate different sections of actual data file according to the file format(Eg: which part is "magic cookie", "offset" etc.)
2. Where is the 2 byte flag which indicates valid/deleted record.

Many thanks,
Alain
Jeffry Kristianto Yanuar
Ranch Hand

Joined: Oct 01, 2007
Posts: 759
Hi friend, welcome to the JavaRanch such a lovely place for Java programmer.

2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information


This is the 2 byte flag which indicates valid/deleted record.

First you have to use RandomAccessFile that point to the database file.


4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record


Let start with the above section.

To read the 4 byte numeric you use readInt() method in RandomAccessFile. Why readInt() ? Because 4 byte is integer (32 bits). the method return an int that read from those 4 byte.

the sample code is :

//create the RandomAccessFile object first
int magicCookie = randomAccessFile.readInt();
int offset = randomAccessFile.readInt();
short numberic = randomAccessFile.readShort(); //short is 2 byte (16 bits)


And the rest is similar

Hope that's help

Jeffry Kristianto Yanuar (Java Instructor)
SCJP 5.0, SCJA, SCJD (UrlyBird 1.3.2) --> Waiting for the result
Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Thanks for the insight Jeffry,
I printed the data file as a string to console using RandomAccessFile.

What I found is:
1. First four bytes showed something which looked like magic cookie i.e. two faces - I guess thats fine, I can save those bytes and compare them whenever a datafile is accessed.

2. But I could not find anything correspnding to delete falg, IS it at the end of every record. I calculated the bytes for all the fields and there are two extra bytes at the end of every record, but they don't display anything. -- Can I use those two bytes to write this flag -- are those two bytes ment for that -- or -- will i be altering the format of datafile by writing over those two bytes(which I am not supposed to).

Please give your feedback.

Thanks,
Alain
Jeffry Kristianto Yanuar
Ranch Hand

Joined: Oct 01, 2007
Posts: 759
2. But I could not find anything correspnding to delete falg, IS it at the end of every record. I calculated the bytes for all the fields and there are two extra bytes at the end of every record, but they don't display anything. -- Can I use those two bytes to write this flag -- are those two bytes ment for that -- or -- will i be altering the format of datafile by writing over those two bytes(which I am not supposed to).


the flag is in the beginning for each record, not in the end of each record.


Please try again

Jeffry Kristianto Yanuar (Java Instructor)
SCJP 5.0, SCJA, SCJD (UrlyBird 1.3.2) --> Waiting for the result
Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Thanks once again Jeffry, I seem to understand the data file now with one doubt left behind.

you are absoulutly right, the delet flag is at the begining of each record.
Actually I was checking records from the middle of the file, so begining of one record was end of other record(I found two empty bytes between two records)

The doubt:
The valid record flag 00 means empty bytes(nothing written on it) OR I should write 00 on it, and ofcourse I will be writing 0x8000 if I have to mark a record as deleted.

Thanks,
Alain
Jeffry Kristianto Yanuar
Ranch Hand

Joined: Oct 01, 2007
Posts: 759
The doubt:
The valid record flag 00 means empty bytes(nothing written on it) OR I should write 00 on it, and ofcourse I will be writing 0x8000 if I have to mark a record as deleted.


Yes if you create a new record, you write 00 flag and all the record's fields. When you delete, you write 0x8000 flag. So deleting record doesn't delete all the byte, just change the flag only.

Finding out how to read the database file is the fist think I did in my assignment. So if you already know how to read it, I'm sure you'll know how to write it. Using RandomAccessFile make it easy to point at a certain byte back and forward.

Good Luck and wish me luck too !!!


Jeffry Kristianto Yanuar (Java Instructor)
SCJP 5.0, SCJA, SCJD (UrlyBird 1.3.2) --> Waiting for the result
[ December 11, 2008: Message edited by: Jeffry Kristianto Yanuar ]
Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Thanks a lot Jeffry for your help.
I will start coding in couple of days, and I guess, I will need lot of help from Ranches.

Wish you all the best for your result
I am sure you will make it.
Let us know once you get your result.

Thanks,
Alain
Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
The valid record flag 00 means empty bytes (nothing written on it) OR I should write 00 on it, and ofcourse I will be writing 0x8000 if I have to mark a record as deleted.


1) While reading the unedited database file, we should treat a record as valid if the flag contains empty bytes.
2) While writing a valid record, we should prefix 00 to the record.
3) This means, while reading an edited database file, we should treat both empty bytes and 00 as flags for valid records.
4) In this case, why should we prefix 00 for a valid record. In all cases, why shouldn't we treat the empty bytes for valid records?

In other words, why should we use the concept of 00 itself, when the original database uses empy bytes as the valid record flag?

Thanks,
Rajesh.
K. Tsang
Bartender

Joined: Sep 13, 2007
Posts: 2584
    
    9

When reading and writing files, it really depends on what IO class you are using. Suppose you use XXInputStream and XXOutputStream then the read/write is separated using these classes. If you say use RandomAccessFile which contains both read and write then in my opinion will make life easier.

When you first read the file, the header is read. Then subsequent runs, you should able to just jump to that particular record and read delete flag and the data. Same for writing.


K. Tsang JavaRanch SCJP5 SCJD/OCM-JD OCPJP7 OCPWCD5 OCPBCD5
Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Rajesh - try having look at the data file in a hex editor, The empty bytes are shows as 00 00.

When you write to file using RandomAccessFile's writeShort(00)... it produces same results in data file. I just learned it by writing to file in different ways.

It is good to understand why's and how's of things, But some times Just making the things work and move forward is a good idea for this assignment.

I understand that when we are working on this assignment we tend to get into detail of everything, but trust me don't get too emotional about this assignment, Make things work right and let it go. If carefully searched, this fourm quickly tells you how to achive desired results.

I hope this helps..
Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
By using RandomAccessFile.readShort(), the valid flag is being displayed as "0". However, it is not possible to use this method for the delete flag because:
1) range of short is between -32768 to 32767
2) delete flag = 0x8000 = 32768. This is outside the range of short.

What are the other ways to handle this scenario?

Thanks,
Rajesh.
Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
Hey dudes, could anyone please provide inputs for the above question?

Thanks,
Rajesh.
Roberto Perillo
Bartender

Joined: Dec 28, 2007
Posts: 2267
    
    3

Hey, Rajesh!

What if you tried something like this (this is how I did it):



Instead of using readShort(), use read(), specifying the flag's size in bytes.


Cheers, Bob "John Lennon" Perillo
SCJP, SCWCD, SCJD, SCBCD - Daileon: A Tool for Enabling Domain Annotations
Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Hi Rajesh, Sorry for delayed response man! I was too busy.

1) range of short is between -32768 to 32767
2) delete flag = 0x8000 = 32768. This is outside the range of short.


Just wirte a small code and try writeShort(0x8000): Since this method accepts "short" argument the complier will not allow anything larger than short without a cast.

I did writeShort(0x8000) for deleteting a record and writeShort(00) for marking a record as valid.

It will work, and you will not have any problems due to this...

Alain Dickson,
SCJP 6, SCJD

Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
Thanks for your responses.

Here's my analysis:

1) We have to read 2 bytes only. This is the requirement. Therefore, we can read only "short" and not "int".

Hence, the following code may not fetch the intented result:
final int eof = database.read(flagBuffer, 0, FLAG_SIZE);

Please correct me if I am wrong.

2) The method writeShort() in RandomAccessFile accepts "int" argument and not "short" argument. This is the reason why writeShort(0x8000) works, eventhough the value is greater than the range of "short".


1) range of short is between -32768 to 32767
2) delete flag = 0x8000 = 32768. This is outside the range of short.


While reading the value, using readShort() will result in -0x8000 (-32768). For getting 0x8000 (32768), we should use readUnsignedShort().

----

Keeping the above points in mind, following 2 options can be chosen:

1)

if (readShort()==0) {
valid record;
}
else {
deleted record;
}

2)

if (readShort()==0) {
valid record;
}
else if (readUnsignedShort == 0x8000) {
deleted record;
}
else {
no idea; // can someone explain ?
}

If we choose Option 1 above, we can read/write "0" for a valid record and any other value for a deleted record. Then, what is the significance for the value 0x8000 ?

Am a little bit confused

Thanks & regards,
Rajesh.
Bert Bates
author
Sheriff

Joined: Oct 14, 2002
Posts: 8883
    
    5
let's not get too detailed you guys...


Spot false dilemmas now, ask me how!
(If you're not on the edge, you're taking up too much room.)
Alain Dickson
Ranch Hand

Joined: Dec 08, 2008
Posts: 53
Hi Bert, let me take one more chance on this....

Rajesh:

The delete flag 0x8000 == 32768 which is greater than two bytes
BUT we have to write this in two bytes(as data File schema only gives us two bytes)
THEREFORE 0x8000(32768) can only be represented as -32768 in two bytes. (I don't think there is any other way to represent this number as positive number in two bytes).
THATS what the method writeShort(int x) of RandomAccessFile does. It does some bit operations and write writeShort(0x8000) as -32768.
WHEN you read using readUnsignedShort() it reads it as 32768==0x8000

Rajesh Said:
if (readShort()==0) {
valid record;
}
else if (readUnsignedShort == 0x8000) {
deleted record;
}
else {
no idea; // can someone explain ?
}


Answer to "else" Part: Database is corrupt, the schema does not allow anything else. So when you do any operation on database check if it is valid(don't read deleted or corrupted records) EXCEPT when you are adding a new record you might want to use a space of deleted record, where you have to look for deleted record (0x8000)
Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
Hi Alain,

That is a very good explanation. Thank you very much.

Regards,
Rajesh.
 
wood burning stoves
 
subject: B&S: Data File Format