• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

read large CSV file and update the database

 
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
HI,

In my project i need to write a batch(java) which will read a CSV file from FTP location consisting more than 5 lakhs records and update a table in my database (Oracle 10g). My concerns here are :
- processing Time
- garbage collection(Stack overflow problem etc.)
- DB handling

Can anybody tell me how can i achieve this, so that it takes very less time without without collecting chunks of data.

Thanks & Regards,
Namrta.
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
1) using one PreparedStatement for all your database updates. You initialize this before you start reading the CSV file, reuse it for each line, and then close it afterwards. This will speed up your database interactions somewhat.

2) read and process each line separately. Most CSV reading libraries (see AccessingFileFormats; I recommend opencsv) allow this. By only storing one line at a time you will not use too much memory.

In pseudo code:
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Rob gives some good ideas, and you can probably make it even faster if you use batch updates. With batch updates, you can execute a bunch of JDBC statements at once, which the database might be able to handle more efficiently than when you do it statement by statement.

See Using PreparedStatement Objects in Batch Updates for more info.

I wouldn't try to batch insert thousands of rows at a time; do a commit every few hundred lines or so.
 
Ranch Hand
Posts: 114
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
5 Lakh records at once? Hmm look out for OutOfMemory issues if the MaxHeap (Xmx) is not high enough!!
I would concur on Jesper Young suggestion of doing intermediate commits only if there is No requirements of "Either All or None", that is if there is any error while inserting any of the record what should be done with the previously processed records. If there is no such dependency then you should commit periodically. and maybe maintain a status of what is the last record comitted or processed so that you can recover for a failure.

HTH
 
Jesper de Jong
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note that not everybody will know what a lakh is - it's 100.000, with "5 lakh" you mean "500.000".
 
Rob Spoor
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Srikanth Nutigattu wrote:5 Lakh records at once? Hmm look out for OutOfMemory issues if the MaxHeap (Xmx) is not high enough!!


That's why I suggested processing each line one by one. A small-enough batch would also not suffer from any OutOfMemoryErrors. You certainly don't need to store all records in memory.
 
Namrta Pandey
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks to you all for your suggestions. Just want to tell you all that i am not inserting any data from csv files... i need to check some values line by line (i.e. records)and then depending upon the value, update the status of a column in database. I've tried this using BufferedReader and prepared statements, for 500 records it took 20 secs. which is not feasible at all. How can i optimize it?

Thanks in Advance.
 
Rob Spoor
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It depends on where those 20 seconds go to. Are they caused by the reading? By the database interactions? Something else?
 
Namrta Pandey
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is taking time while updating the records in database, now i have got another problem that apart from checking condition from file data, i need to check some condition from table iteself for the respective records and then update and make insert in another table., which will definitely take a lot of time if i write it in java. I am thinking of doing all this in a procedure(batch) in oracle. Can any body tell me how to csv read file from ftp location in a procedure.
 
Namrta Pandey
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot all of you, the problem was from database side, so i have sql loader and it resolved the issue.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic