wood burning stoves 2.0*
The moose likes Java in General and the fly likes Rapid IO(I/O) and Search Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Rapid IO(I/O) and Search" Watch "Rapid IO(I/O) and Search" New topic
Author

Rapid IO(I/O) and Search

Rodney Woodruff
Ranch Hand

Joined: Dec 04, 2001
Posts: 80
I have file (file 1) with 47 million lines in it. I also have a file (file 2) that is the same as file 1 except it differs from file 1 in the following ways:
File 2 could have new lines added
File 2 could be missing lines that are in file 1
I have to do two things:
1. Efficiently take all file 1 lines and insert them into a database. This is somewhat straight forward but any thoughts on rapidly doing this are welcome.
2. Find all the lines in File 1 that are missing in File 2. Can you help figure out the fastest way to perform this search without reading both files into a database and doing a some sort diff on the tables? I would prefer to do this before doing an insert into the database.
Thanks for all your help and I'm looking forward to your responses.
-- Rodney


Hope This Helps
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Are the lines sorted in any way? If not, you'll probably have to put everyting into a database, since there's no way to tell if a given line is really missing, or just in a different location, unless you read the whole file.
If the lines are sorted somehow, you can keep two readers open and read them line by line, switching from one reader to another to keep them roughly in sync. E.g. if file 1 has
A
B
C
D
E
G
and file 2 has
A
B
D
E
F
G
you can do something like this
read 1: A
read 2: A
read 1: B
read 2: B
read 1: C
read 2: D - missing C in file 2 detected; read 1 up to D
read 1: D - caught up
read 1: E
read 2: E
read 1: G
read 2: F - missing F in file 1 detected; read 2 up to G
read 2: G - caught up
The logic may take some thought to code right, but it's certainly doable, and much faster than searching a DB for each line. But it only works if you have some way of knowing when you've read too far in one file.


"I'm not back." - Bill Harding, Twister
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Rapid IO(I/O) and Search
 
Similar Threads
0xFF implies deleted record
XML --> Database: What are my options? Best Practices?
Reading/Writing Foreign Text
Problem accessing Database from JSP
nx: All of URLy Bird 1.1.3 read/write lock(2)