I don't know what they're looking for, but my opinion would be that the excerpt you've showed there wouldn't necessarily be different for a huge file. The differences would be in what you do with it. For example, you might not be able to read the entire file into memory and then process the results - you'd probably be looking to completely handle one record at at time.
The data format is important. If it is ordered, you may not be able to use threads.
Depending on what data you have (e.g. keyed, sorted, has unneeded rows, has huge data on each row, has ONLY 1 byte on each row, etc.), you may first need to pre-process it, if possible for faster processing.
Second is process the data the best way possible, e.g. threads, etc.
Another thing on rows with huge data, according to experience, your process can take 3 months, instead of 3 hours, if you load huge rows to memory unnecessarily. Try keep huge data out of memory until needed.
but seriously is there any better way of increasing the performance by loading chunks of data instead of loading each line and inserting onto the database?
Joined: Feb 26, 2005
For such requirements, you can research on etl techniques (extract, translate, load).
For example, if you need to process 10 million record per hour, you really need to optimize your etl process. The data format is important. Your optimization will base on it. For example, by doing preprocessing in one of the stages - e , t or l.
Of course, better hardware will help.
On not so extreme cases, you can consider using other languages and software, if your current process is too slow.
I understand what you are trying to say. Informatica is an excellent ETL tool. In similar perspective we can also use SQL Developer or TOAD to read the files and insert onto the database. The data format is extremely important for both.
But my question remains the same instead of using these tools if i have to operate with only Java what is the best way to do so.
Joined: Feb 26, 2005
I have used java only on my etl systems. You can implement your own etl, and it is best that way sometimes.