IntelliJ Java IDE
The moose likes Performance and the fly likes Best  way to compare file records .. Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Performance
Reply Bookmark "Best  way to compare file records .." Watch "Best  way to compare file records .." New topic
Author

Best way to compare file records ..

shaju joseph
Ranch Hand

Joined: Jun 28, 2001
Posts: 30
Hi,
I need to compare three files looking to see if certain fields are present in all three of them. If not I need to store them as error records. The number of records could be from 100000 to 1000000. My question is can I store these records in an ArrayList and do the comparisons ? Can ArrayList handle this volume ? What is the best way to do this ?
Any help is appreciated.
Thx
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18652
That's going to take a lot of memory. I believe an ArrayList will use a minimum of four bytes per entry, so one million entries will take at least 4 MB. And that's just for the ArrayList, not the objects inside. For that, it depends on what your record structure is like. The simplest possibility I imagine is a record consisting of a single String - this will take at least 20 bytes per record. So now you're looking at 24 MB, times 3 because you're doing this for 3 files, right? That's 72 MB for the simplest, shortest possible records. Probably a lot more in practice. If you've got the memory available, it's possible, but I'd really try to find another way.
If the files are sorted somehow, you're in business - you can open three different readers, one per file, and read through all three files simultaneously, using the sorting to keep your readers in sync (so they're all looking at the same parts of each file). I describe something like this here. You may well find it's best to handle the files two at a time for simplicity. First compare file1 and file2, logging any differences - then compare file1 and file3 (or file2 and file3 if you prefer). Dealing with only two files at once will be much simpler to code and debug, I think - don't try to handle three files at once until you've got two working well.
If the files are not sorted in advance, I think it will really be in your interest to sort them by some attribute (choose whatever's conventient), and then use the method described above. Sorting may be problematic for memory reasons (as described above). I'd look for a sorting algorithm which allows you to make use of external memory (files) rather than keeping everything in RAM. A balanced k-way merge sort seems like a good candidate. This will probably take some time to do right; I'd think someone may well have already implemented this in Java somewhere, so I'd take some more time searching for existing implementations if you do need to sort the data. Good luck...


"I'm not back." - Bill Harding, Twister
shaju joseph
Ranch Hand

Joined: Jun 28, 2001
Posts: 30
Thank you so much for your insight.
 
IntelliJ Java IDE
 
subject: Best way to compare file records ..
 
Threads others viewed
JApplets and page source
How to scroll between resultsets...
looping n displaying records
Insert a date of type timestamp into oracle.
Advice on Checking the data with database
MyEclipse, The Clear Choice

cast iron skillet 49er

more from paul wheaton's glorious empire of web junk: cast iron skillet diatomaceous earth rocket mass heater sepp holzer raised garden beds raising chickens lawn care CFL flea control missoula heat permaculture