This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
I have an application that allows us to view binary log files for one of our other in-house applications. It works fine except that with large log files we use way too much RAM. This is because we read the entire log file into memory. I came up with a solution that converts the log file to fixed length ascii records for display in the JTable. The table model reads the fixed length record information as needed. This reduced the memory use dramatically. The scroll speed seems to be fine. I thought I might take a big peformance hit because of the file I/O. The only "gotcha" is that the user needs to be able to add, edit, and delete log file entries.
I am not sure how I am going to do this without re-writing the file on every update. Is it possible to insert records into a file without re-writing the entire file? What about deletes? Can you remove a section of bytes from a file without re-writing the entire file?
I considered databases. How much memory will it take to show a 300000 record table in a JTable? Will the query return a 3000000 recordset? If so then this does me no good, I am viewing a file so that I don't have to load the entire contents into RAM.
Originally posted by Joseph Goodman: Will the query return a 3000000 recordset? ... I am viewing a file so that I don't have to load the entire contents into RAM.
You can use the same tricks as with your file solution. You can easily select subsets of the whole data set using a database and page just as you are doing with the file.
More common, however, is to allow the user to search on various fields to pull back just the data they need to see. Very rarely should you allow someone to view all of the data in a table. For example, you could have a date range pair of fields, a drop down for severity or module or whatever other fields you have in your log entries.
That being said, you could still make the file version work reasonably well. For one thing, buffer all of the edits and do the save all at once to minimize rewriting the file over and over. Second, by using fixed-sized records (and I assume a RandomAccessFile), updates are essentially free -- delete and insert are the trick.
If you collect all the deletes and inserts and order them as they will be applied in the file, you could probably do it quite well by using two file channels/streams. You open both at the first point of modification. Then, read ahead (buffered) and start writing along behind it with the changes. The more inserts you have, the larger a buffer you'll need. Once you get to the end, if you had more deletes than inserts, just truncate the file.
Post again if you'd like a further explanation of how that would work. The other option is to perform all updates while copying to a nwe file. Once it completes (and succeeds), swap the temp file in for the real one. This is also safer and will survive if the writing fails.
Joined: Aug 31, 2004
You are right the insert and delete are causing me a headache. I would like to hear more about your proposed solution if you have time.
Joined: Aug 07, 2003
The main assumption is that the file needs to remain ordered, so deleting a record must shift all further records back rather than simply leave a hole or move the last record to the deleted one's position. First some caveats:
I assume that you can open the same file with two RandomAccessFiles, but I haven't tried it.
If the process fails midway through, the file will be corrupted. You may want to modify the algorithm to copy the file instead of overwrite it.
This would be likely be far more efficient using a database, as Stefan mentioned.
With that out of the way, the first step is to have your application track the state of all records the user has changed -- inserted, deleted and updated -- in memory. When interacting with databases, When the user is ready to save, sort these by file order.
Conceptually, it looks something like the following (each lowercase letter on the first line represents a record in the original file; an uppercase letter is an updated record; a - is a deleted record; numbers are inserted records):This represents these modifications:
Record d updated
Record e deleted
Record f deleted
Record 1 inserted before record g
Record 2 inserted before record h
Record j updated
Record l deleted
To persist these, first open two RAFs (one for reading, one for writing) and seek them both to the first changed record. Now loop over all records from that point forward and write them into the write RAF based on the record's state:
unchanged : write record by reading from Read RAF
inserted : write record in memory
updated : write record in memory and skip record from Read RAF
deleted : skip record from Read RAF (no write)
While doing this, you need to ensure that you buffer records to make space for insertions. As you're overwriting the file in-place, any inserted recoords will destroy records that may need to be copied. Take for example the simple case of inserting a record at the start of the file. Writing the new record (1) will overwrite the previous first record (a). This will probably be the trickiest part before you attempt to optimize. Also, using the copy-on-write method will remove the need for buffering in that case since you're writing to a new file.
Finally, once complete you will need to truncate (setLength) the file if more records were deleted than inserted.
Here's how that would look for the example above. Note that this only shows the operations -- not the buffering that would need to happen, though in this case no buffereing is necessary thanks to the two up-front deletions.Now I suspect with 300,000 records you'll see some performance problems. If so, the first optimization you can make is to detect runs of unchanged records and read/write them using buffering (multiple records at once). Next would be to detect runs of other types of records (three adjacent updated/deleted records can be skipped in the Read RAF in one operation).
Good luck, and let us know how it goes. I'm curious to hear if this works reasonably well for a file.