Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Binary Files

 
Stephen McDermott
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I know how to read/write to binary files, but only static record sizes.

Can someone help me develop an algorithm for writing/reading variable record sizes?
 
Nicholas Jordan
Ranch Hand
Posts: 1282
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Read in an int or something at the beginning of the file. Use that value to control the length of the file reading.
 
Stephen McDermott
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
but if each individual record is variable...that won't quite work...
 
Nicholas Jordan
Ranch Hand
Posts: 1282
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well if each record is of variable length it gets in need of some thought or something, but variable length traffic goes all over the place so several approaches should be within reach.
Class File{
nested class fileHeader{
int number of records;
int first record length;

The problem already resembles a linked-list, which is known computer science. The tape archive has been ported to java, that is by nature a variable length record format.
 
Stephen McDermott
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not sure what you're getting at here...

I already have my files organized in a LinkedList (for sorting/searching)

whats with this file class?
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using binary files with variable-length content is tricky. The RandomAccessFile class can read them fairly efficiently, but you can use it for writing only by overwriting bytes, not by inserting or deleting anything.

That means you can't use it (or any other class of the JRE) to replace a record of length N with a record of length M in the middle of a file.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephen, has the format of these files already been established, or do you get to choose the format?

And what does "static record sizes" mean? I thought maybe all the records were the same size, but according to your second post, that's not the case.

If you can choose the format, there are several options:
  • Put a number at the beginning of each record indicating the length of that record. This is probably simplest for a binary file. This might be what Nick was trying to get at, but you need a record length for each record, not just one at the beginning of the file. And you don't necessarily need to know the number of records in advance - the end of the file can signal the end of records.
  • Use a particular delimiter between records. This works well for text formats, where you can use a newline, or XML, or any other delimeter that seems convenient. You need to be able to escape the delimiter if it occurs naturally within a record, e.g. replacing a newline with "\n". For a binary format it may be too much trouble to guard against the possibility of delimiters occurring accidentally within a record.
  • Create an index that identifies where each record begins. This might be located at the beginning of the file, or in a separate file. This is more complex, but it allows the records to be accessed in random order, without needing to read every previous record in order to get a record near the end.
  • Use some other existing tool to write records so that you don't need to know the format yourself. The main example that comes to mind is Java's object seriealization format, using writeObject() and readObject() from ObjectOutputStream and ObjectInputStream. Another possibility is XStream. You could even use a file-backed database such as HSQL.

  • Ulf's comment applies equally to any of these techniques (except perhaps the database option where it's all handled for you). If you need to be able to change these records, you pretty much need to rewrite the entire file.

    Exception #1: if you only need to add records at the end of the file, that's fine, you can just append them. (This gets more complicated if you have an index though.)

    Exception #2: if you include a delete flag as part of the format for each record, you can delete a record by setting its delete flag, without changing it's length. Then you can delete a record from the middle of the files, and write a new version of the record at the end of the file, without having to rewrite the entire file. This can be very fast initially, but eventually you may want to rewrite the entire file and remove those deleted records entirely. Note that if you start down this road, you're well on your way to writing your own database program, and you might well be better off using an existing database instead.
    [ March 12, 2008: Message edited by: Jim Yingst ]
     
    Stephen McDermott
    Greenhorn
    Posts: 4
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Lets see...

    Each record will be of variable length, because the user will be able to store data in them over time...so progressively, they get larger..

    I know that it'd be best to use a text file, but for my project (to get 3/15 credits, I need a binary file)
     
    Nicholas Jordan
    Ranch Hand
    Posts: 1282
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    String has a method byte[] LINEBUFFER = String.getBytes();// which can be used directly, or several of the classes in java.io have methods that will write a string. I strongly advise against getting into char/byte translation. In general review the four approaches Jim provides, as you can see he was even able to project where I was trying to take one approach and as well gives effective and complete overviews of four approaches. Persisting data from one invocation of the program to the next also can involve cross-checking the data against multiple files, but this may not be part of the spec you were given. If you want credits, then you need to use a team approach which resolves to code what you were told to code.

    Start with the 'code you can understand' method for selecting which of the four approaches to use. Set your first goal of 'anything that works' then wrap-around to beginning and look for ways to make it actually work.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic