This week's book giveaway is in the OCMJEA forum.
We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line!
See this thread for details.
The moose likes Java in General and the fly likes read large CSV file and update the database Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "read large CSV file and update the database" Watch "read large CSV file and update the database" New topic
Author

read large CSV file and update the database

Namrta Pandey
Ranch Hand

Joined: Jul 22, 2008
Posts: 36
HI,

In my project i need to write a batch(java) which will read a CSV file from FTP location consisting more than 5 lakhs records and update a table in my database (Oracle 10g). My concerns here are :
- processing Time
- garbage collection(Stack overflow problem etc.)
- DB handling

Can anybody tell me how can i achieve this, so that it takes very less time without without collecting chunks of data.

Thanks & Regards,
Namrta.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

1) using one PreparedStatement for all your database updates. You initialize this before you start reading the CSV file, reuse it for each line, and then close it afterwards. This will speed up your database interactions somewhat.

2) read and process each line separately. Most CSV reading libraries (see AccessingFileFormats; I recommend opencsv) allow this. By only storing one line at a time you will not use too much memory.

In pseudo code:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14117
    
  16

Rob gives some good ideas, and you can probably make it even faster if you use batch updates. With batch updates, you can execute a bunch of JDBC statements at once, which the database might be able to handle more efficiently than when you do it statement by statement.

See Using PreparedStatement Objects in Batch Updates for more info.

I wouldn't try to batch insert thousands of rows at a time; do a commit every few hundred lines or so.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Srikanth Nutigattu
Ranch Hand

Joined: Oct 30, 2007
Posts: 114
5 Lakh records at once? Hmm look out for OutOfMemory issues if the MaxHeap (Xmx) is not high enough!!
I would concur on Jesper Young suggestion of doing intermediate commits only if there is No requirements of "Either All or None", that is if there is any error while inserting any of the record what should be done with the previously processed records. If there is no such dependency then you should commit periodically. and maybe maintain a status of what is the last record comitted or processed so that you can recover for a failure.

HTH
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14117
    
  16

Note that not everybody will know what a lakh is - it's 100.000, with "5 lakh" you mean "500.000".
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

Srikanth Nutigattu wrote:5 Lakh records at once? Hmm look out for OutOfMemory issues if the MaxHeap (Xmx) is not high enough!!

That's why I suggested processing each line one by one. A small-enough batch would also not suffer from any OutOfMemoryErrors. You certainly don't need to store all records in memory.
Namrta Pandey
Ranch Hand

Joined: Jul 22, 2008
Posts: 36
Thanks to you all for your suggestions. Just want to tell you all that i am not inserting any data from csv files... i need to check some values line by line (i.e. records)and then depending upon the value, update the status of a column in database. I've tried this using BufferedReader and prepared statements, for 500 records it took 20 secs. which is not feasible at all. How can i optimize it?

Thanks in Advance.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19684
    
  20

It depends on where those 20 seconds go to. Are they caused by the reading? By the database interactions? Something else?
Namrta Pandey
Ranch Hand

Joined: Jul 22, 2008
Posts: 36
It is taking time while updating the records in database, now i have got another problem that apart from checking condition from file data, i need to check some condition from table iteself for the respective records and then update and make insert in another table., which will definitely take a lot of time if i write it in java. I am thinking of doing all this in a procedure(batch) in oracle. Can any body tell me how to csv read file from ftp location in a procedure.
Namrta Pandey
Ranch Hand

Joined: Jul 22, 2008
Posts: 36
Thanks a lot all of you, the problem was from database side, so i have sql loader and it resolved the issue.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: read large CSV file and update the database