Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes URLyBird: Read data file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "URLyBird: Read data file" Watch "URLyBird: Read data file" New topic
Author

URLyBird: Read data file

Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
Hello Folks,

I am working on URLyBird Version 1.2.2.

--------------------
The format of data in the database file is as follows:
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record

Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block

Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information

End of file

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.
--------------------

I am trying to read the data file the following way.

package scjdtest.io;

import java.io.*;

public class ReadTest {

private static RandomAccessFile database = null;
static final int NAME_FIELD_LENGTH = 64;
static final int LOCATION_FIELD_LENGTH = 64;
static final int SIZE_FIELD_LENGTH = 4;
static final int SMOKING_FIELD_LENGTH = 1;
static final int RATE_FIELD_LENGTH = 8;
static final int DATE_FIELD_LENGTH = 10;
static final int OWNER_FIELD_LENGTH = 8;
static final int RECORD_LENGTH = NAME_FIELD_LENGTH
+ LOCATION_FIELD_LENGTH
+ SIZE_FIELD_LENGTH
+ SMOKING_FIELD_LENGTH
+ RATE_FIELD_LENGTH
+ DATE_FIELD_LENGTH
+ OWNER_FIELD_LENGTH;

public static void main (String [] args) throws Exception {
String [] strRecord = null;
database = new RandomAccessFile("db-1x2.db", "r");
System.out.println("Length of the database is " + database.length());
for (int i = 0; i < database.length(); i++) {
strRecord = readRecord(i);
for (int j = 0; j < strRecord.length; j++) {
System.out.println("[" + i + "," + j + "] = " + strRecord[j]);
}
}
}

private static String [] readRecord(long recNo) throws Exception {
final byte[] input = new byte[RECORD_LENGTH];
synchronized (database) {
database.seek(recNo);
database.readFully(input);
}

// now convert those bytes into a String[]. The thread that is doing
// this conversion can be running while other threads are doing
// other work - they are no longer being blocked.

/**
* class to assist in converting from the one big byte[] into
* multiple String[] - one String per field.
*/
class RecordFieldReader {
/** field to track the position within the byte array */
private int offset = 0;

/**
* converts the required number of bytes into a String.
*
* @param length the length to be converted from current offset.
* @return the converted String
* @throws UnsupportedEncodingException if "UTF-8" not known.
*/
String read(int length) throws UnsupportedEncodingException {
String str = new String(input, offset, length, "ISO-8859-1");
offset += length;
return str.trim();
}
}

RecordFieldReader readRecord = new RecordFieldReader();
String returnValue[] = {
readRecord.read(NAME_FIELD_LENGTH),
readRecord.read(LOCATION_FIELD_LENGTH),
readRecord.read(SIZE_FIELD_LENGTH),
readRecord.read(SMOKING_FIELD_LENGTH),
readRecord.read(RATE_FIELD_LENGTH),
readRecord.read(DATE_FIELD_LENGTH),
readRecord.read(OWNER_FIELD_LENGTH)
};

return returnValue;
}
}

--------------------

Output:

Length of the database is 4743
[0,0] = J
[6,4] = Smallvi
[6,5] = lle
[6,6] =
[7,0] = J[8,2] =
[8,3] = S
[8,4] = mallvill
[8,5] = e
[8,6] =
[9,0] = name allv
[14,3] = i
[14,4] = lle
[14,5] =
[14,6] =
...
...
...
--------------------

As seen above, printing the output does not fetch the desired results. Even the order of printing is not coming correctly. After [0,0], [6,4] is printed. After [9,0], [14,3] is printed and so on....

Could anyone help me out in this topic?

Thanks & regards,
Rajesh Moorthy.
Kah Tang
Ranch Hand

Joined: Sep 10, 2007
Posts: 58
I can only give you one tip: read the instructions. It tells you there are 3 sections in the DB file you need to read. Each section of the file contains the information to read in the records.
[ September 24, 2008: Message edited by: Kah Tang ]
Rajesh Moorthy
Ranch Hand

Joined: Sep 23, 2008
Posts: 30
Thanks for your response.

I also had the same impression that it has something to do with the data file format. However, actually I am not able to understand head or tail of the format of data in the database file. Could you please help me out?
Tobias Lund-Melcher
Greenhorn

Joined: Sep 24, 2008
Posts: 4
Hi Rajesh

I'll try to give you some tips on regarding your code and reading the file.
  • You're using the seek method wrong, seek tells the RandomAccessFile where to start the read operation (not which record to read)
  • You should not hardcode field lengths, you should instead read the field lengths as defined in the schema description.


  • To read a record you could do something like this;

    seek(offset + (recordlenght * recNo))
    readRecord

    You should read the file metadata like this;

    magic = readInt()
    offset = readInt()
    fields = readShort()

    for (fields)
    fieldBytes = new byte[readShort()]
    read(fieldBytes)
    fieldnames[i] = new String(fieldBytes)
    fieldlengths[i] = readShort()
    [ September 26, 2008: Message edited by: Tobias Lund-Melcher ]

    SCJP 5
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    Thank you for the hints.

    Actually I do not understand the difference between the Data file format and the Database schema (that provides the lengths of the fields and the order in which they appear, for example, name, location etc). Could you please explain what the Data file format is intended for and what does it mean?

    Thanks again.
    [ October 10, 2008: Message edited by: Rajesh Moorthy ]
    Alex Belisle Turcot
    Ranch Hand

    Joined: Apr 26, 2005
    Posts: 516
    The database is just a plain text file...

    The instructions tells you that the first 4 bytes are this... that... this..
    After you've read the bytes specified, the next bytes you'll read will be records, records... keep reading bytes.. still records....

    Let me rephrase:

    The instructions tells you each "header title" length, so you can deduct the entire "head" length.. After that, each bytes you read will be part of records..

    The instruction also tells you that each record and each field are fixed length.

    With all this, you should be able to read each byte of the file and get this data into variables.

    Regards,
    Alex
    [ October 10, 2008: Message edited by: Alex Belisle Turcot ]
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    I could read the data file now.

    Thank you Alex, for your very good explanation of the Data file format. Thank you Tobias, for exactly pointing out the mistakes in my code and suggesting the right approach.



    Best regards,
    Rajesh.
    [ October 11, 2008: Message edited by: Rajesh Moorthy ]
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    The Data file format starts with:
    4 byte numeric, magic cookie value identifies this as a data file

    Where does this value need to be used? Or can we simply ignore it?
    [ October 12, 2008: Message edited by: Rajesh Moorthy ]
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11460
        
      94

    Hi Rajesh,

    Take a look at the FAQ

    Regards, Andrew


    The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    Hi,

    The FAQ says:

    "If I write my Data class such that it checks the magic cookie for the first example, and expects the meta-data and schema for that first example, then it will be able to read any file that has the same magic cookie, meta-data and schema. But as you can easily see, without the check on the magic cookie, the Data class would quickly fail when trying to read the second file format - the field sizes are different, and some fields are different."

    Let us consider a case where the magic cookie value is absent and therefore the Data class does not care about this. Even then, it may be possible for the Data class to read any file that has the same meta-data and schema. Please correct me if I am wrong.
    Alex Belisle Turcot
    Ranch Hand

    Joined: Apr 26, 2005
    Posts: 516
    Originally posted by Rajesh Moorthy:
    Hi,

    The FAQ says:

    "If I write my Data class such that it checks the magic cookie for the first example, and expects the meta-data and schema for that first example, then it will be able to read any file that has the same magic cookie, meta-data and schema. But as you can easily see, without the check on the magic cookie, the Data class would quickly fail when trying to read the second file format - the field sizes are different, and some fields are different."

    Let us consider a case where the magic cookie value is absent and therefore the Data class does not care about this. Even then, it may be possible for the Data class to read any file that has the same meta-data and schema. Please correct me if I am wrong.



    The magic cookie is used to identify the data file. If the magic cookie is not present, your application can consider that the file is not a valid database file.

    This might prevent your application from crashing if SUN uses the wrong database file to test your application

    Regards,
    Alex
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    Thanks Alex, for your response again.

    Suppose the magic cookie value is 100. Do you mean I should code something like the following:

    int magic = database.readInt();
    if (magic != 100) return;
    Alex Belisle Turcot
    Ranch Hand

    Joined: Apr 26, 2005
    Posts: 516
    Originally posted by Rajesh Moorthy:
    Thanks Alex, for your response again.

    Suppose the magic cookie value is 100. Do you mean I should code something like the following:

    int magic = database.readInt();
    if (magic != 100) return;


    Yes, if the magic cookie is different, you could throw an "InvalidDatabaseException" for instance..
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    The following is my understanding. The database file has to be read once and the magic cookie value to be known. Then, this value has to be hardcoded in the program for checking the validity of database file during future reading. Is this intented? Is hardcoding not inherently a bad approach? One more point: is it not possible for another database file to have this same magic cookie value, but a different metadata and schema?
    Alex Belisle Turcot
    Ranch Hand

    Joined: Apr 26, 2005
    Posts: 516
    The magic cookie must be hard coded in your application or else it doesn't serve its purpose in my opinion..

    I agree with you that this is not 100% clear when reading the instructions, however after roaming on this forum for a long while, this is the common understanding..

    The database file provided starts with a special value which your application can use to uniquely identify it. Any other "database file" without this value will be discarded. This allow your application to only handle a valid database.

    If you load this magic cookie by first reading it from the file, how would you know if the first 4 bytes are valid or not ?


    4 byte numeric, magic cookie value. Identifies this as a data file


    - Read the 4 bytes;
    - Compare it to the "magic cookie";
    - Identify if the file is a valid database file or not.

    Regards,
    Alex
    Rajesh Moorthy
    Ranch Hand

    Joined: Sep 23, 2008
    Posts: 30
    Thanks for all your explanations, Alex !

    I have understood the point.

    Just a thought that came to my mind: Would it not be possible for another database file to have the same magic cookie value, but a different metadata & schema? In this case, we will read the magic cookie value and it will match. So we continue reading the database file, but this will fail as it does not have the same metadata and schema that we expect.

    Is there any possible ways to avoid such a scenario?

    Thanks & regards,
    Rajesh.
    Alex Belisle Turcot
    Ranch Hand

    Joined: Apr 26, 2005
    Posts: 516
    Hi,

    There isn't much more you can do anyway, most probably (statistically) the magic cookie will protect your application..

    Remember that this is the case where the user actually "selects" the wrong database file for your application, which should not even happen for this simple application!

    Assuming an invalid file contains your magic cookie... Once you've read that magic cookie, unless you reach the end of file sooner than expected, there isn't many ways you can realize the file isn't a valid database file...

    In so many words, comparing the magic cookie is more than enough, it protects you from a (possible) mistake from the examiner and is *probably* a must (or a should with points at least

    As far as the instructions goes:
    4 byte numeric, magic cookie value. Identifies this as a data file


    * The exact 4 expected bytes = this is the valid data file.

    Regards,
    Alex
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: URLyBird: Read data file