I have a file which has 2 million records, each record is of one line however size is variable but max 32 characters.
I display records on page (1,000 records a page).
Here user can add record as well as modify/edit record as all 1000 records shown on page can be modified. Also user may delete record by checking checkbox which appear with every record.
Assume user is on page 5 hence viewing 4001 to 5000 record after add and modify he clicks save following even occur on save
1. I create a temp file. 2. Read records sequentially using readLine() from original file and copy to temp file (1 to 4000). 3. Next I write records from form 1000 records which I displayed on screen (so that modification done by user on viewed records is saved). 4. I go to 5001 record in original file and write to temp file remaining all records. (Now temp file contains updated data). 5. Take backup of original and rename temp to original.
Also I show page links so when user clicks on say 100 page I have to sequentially read file (skip 100000 line using readLine()). Than read 1000 records which will be shown to user.
I find this solution not v good but also can not think of better solution.
I have to do so many readLines because record size is not fixed. Also since record can be added in between and also deleted hence have to create temp file to copy all records.
If you can make all the records the same size, then you can use RandomAccessFile, which will let you skip directly to a record and modify it without rewriting the whole file.
If you can't, then you could do something a little more complex: make an "index file" which shows the starting offset of each record. Then when you modify a record, append the new one to the end of the file, and modify the index file to point to the new offset (you'd use RandomAccessFile on the index.) Occasionally (overnight?) you could rewrite the main file, omitting all the "dead" records.
1. I have to write record in between two records not at the end. 2. Also with RandomAccessFile I can modify record of same length ABCD; now I can modify this record with 5 chars only not more or less. Or my next record gets corrupted. 3. During deletion I have to rewrite file.
For above problems it seems to me that everytime I have to rewrite file i.e. create temp file and rename to original.
author and iconoclast
If you have to keep the records in a file, in order, and they're variable length, then yes, you're pretty much hosed. In this kind of problem, you can improve performance only by changing the data structures you use. Since this is apparently not possible, you have two options:
1) Go to whoever is insisting on this storage format, explain why it's a bad choice, and offer alternatives.
2) Wait until that same person complains to you about the performance of the deployed system, tell them it's because of the storage format, and suffer the consequences at that time.
But there's no magic way to make rewriting the file go ten times faster.
Joined: Jun 13, 2005
Thanks a lot. I am more worried about corruption of data rather than performance. But it seems this is the only way, I just wanted to know some senior's opinion. Thankyou Sir.
author and iconoclast
I am more worried about corruption of data rather than performance.
If you write a new file, while saving the old one, and only rename the new one once you know that the file writing went OK, as you've described, then this is generally a safe thing to do. Of course, you do have to worry about concurrent updates, something you haven't mentioned here. If there is more than one user of the system at a time, then rewriting a single file obviously becomes a difficult and dangerous thing to manage.
Even though you cant have a Database the entities of your file should have a valid max size. So practically you should be able to use fixed width records,but that also might mean wastage of space and increased file size.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com