aspose file tools*
The moose likes Performance and the fly likes 1 Lakh( Hundred Thousand ) lines of data into database. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "1 Lakh( Hundred Thousand ) lines of data into database." Watch "1 Lakh( Hundred Thousand ) lines of data into database." New topic
Author

1 Lakh( Hundred Thousand ) lines of data into database.

Sathish Ramadas
Greenhorn

Joined: Feb 18, 2004
Posts: 27
Hi all

Requirement : Read data from a flat file and insert into database.

The total number of rows in the data file are around 1 lakh ( each line contains 700 characters). We need to dump those data into databse.

Method:
Read the data using buffered input stream and store it in arraylist and insert into database. Since we cannot directly store all 1 lakh lines of data into arraylist at the same time, i am doing it block by block. Storing 25000 lines at a time and insert into database then clear arraylist then again start the process.

This way it takes 12 min to insert 1 lakh record into database.

I am not very sure how much time will it take really?

please suggest me.
[ January 04, 2006: Message edited by: Ajith Kallambella ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Are you using the SQL Statement addBatch() to batch up a bunch of inserts? You should be able to read a line and add a SQL statement without ever putting the lines into a collection like an ArrayList. Add, execute and commit some number of lines at a time. Give yourself a command line argument or configuration option to set the number of lines you read between commits.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Jeff Albertson
Ranch Hand

Joined: Sep 16, 2005
Posts: 1780
What's a lakh?


There is no emoticon for what I am feeling!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Google is your friend, Jeff. The first link it returns is to the Wikipedia article.
Jeff Albertson
Ranch Hand

Joined: Sep 16, 2005
Posts: 1780
Yaar, I was just kidding. I always measure throughput in crore per kalpa.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

So, you didn't really have a doubt?
Jeff Albertson
Ranch Hand

Joined: Sep 16, 2005
Posts: 1780
...that he hasn't jumped into this thread is clear evidence that "fstream" doesn't read these forums...
Sathish Ramadas
Greenhorn

Joined: Dec 06, 2005
Posts: 4
Thanks for your suggestions...

Lakh Lines -> Hundred Thousand Lines.

When i looked at the executions, i don't see any delay in storing data into the collections classes and retreiving it.

Let me check it without using collection classes and just directly inserting the data into database.

Thanks.
Sathish Ramadas
Greenhorn

Joined: Dec 06, 2005
Posts: 4
I removed the collection classes and tried a direct insertion of data into database.
Also i noticed database insert timing alone. I calculated start time before database action starts and ends. It took more time than when i used collection classes. For hundred thousand records it's taking 14 mins.

when i test database insertion alone, i am not reading the data from the file and inserting it...just a database insertion (hard coded value in the statment values) alone takes more time.

When I used to read data from flat file, I use to store it in collection classes (arraylist) and insert into database. In the mean time, when the ararylist size reaches 15000, I will clear it and start the iteration again. So database conenction also get refreshed during this time.

Is it because of single database connection inserting hundred thousand records takes me more time (This test is a direct database insertion wihtout using collection classes).


PLease suggest me.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Where you store the data in memory on the client side will have very little effect. From the client's point of view, reading the records and writing them directly to the database should be the most effective. Anything else (such as storing them in collections) is overhead. The database doesn't care that you did all that.

How often are you committing these changes to the database? Are you using batches? Those are the relevant questions.
Deepak Shankar
Greenhorn

Joined: Dec 14, 2005
Posts: 12
Hundred thousand records! It is interesting indeed...

For the database inserts are you using Stored Procedure? If not try it with Stored Procedure (making callable statements instead of Statement/Prepared Statement). It should make it much faster than the normal insert. However you may want to check the buffer limit in the database (since hundred thousand is huge).
Mahadevan Gorti SS
Greenhorn

Joined: Jan 31, 2006
Posts: 18
Sathish,

Time taken depends upon various parameters like:
-- What is DB(MySql,Postgres,Oracle,HSQLDB,ClouseScape,etc)
-- What type of JDBC driver you are using ?
-- What type of file system and related question (for HDD related problems)
-- What is OS/Memory/CPU-Load while you are testing ?
-- Are you using PrepareStatement & addBatch
-- Did you set autocommit off before processing the commit

regards

mahadevan

PS: I got 4500 records/sec in MySQL(for record size of 200 bytes)
I got 6000 records/sec in PostgreSQL(for record size of 200 bytes)
I got 36000 records/sec in HSQLDB(for record size of 200 bytes)
my taking care of above parameters
Reid M. Pinchback
Ranch Hand

Joined: Jan 25, 2002
Posts: 775
Originally posted by Deepak Shankar:
For the database inserts are you using Stored Procedure? If not try it with Stored Procedure (making callable statements instead of Statement/Prepared Statement). It should make it much faster than the normal insert. However you may want to check the buffer limit in the database (since hundred thousand is huge).


This sounds interesting. Why would using a callable statement+stored procedure be faster than a prepared statement? If anything I would have expected there to be additional overhead for pulling in the interpreter for the database vendor's more-than-SQL language processor.

Different question for Sathish - is there a particular vendor's database you are using? You may have additional options depending which vendor it is.


Reid - SCJP2 (April 2002)
Adam Richards
Ranch Hand

Joined: Nov 03, 2005
Posts: 135
For maximum performance, consider your DBMS' bulk loader utility (e.g., SQLLDR in Oracle). They temporarily disable index updates & commits, which speeds things up dramatically. These utilities typically have a throughput of several hundred thousand rows per minute.
Appu Chan
Greenhorn

Joined: Aug 29, 2002
Posts: 28
Is it a hard requirement that you must do it through a Java program? Else its is always better to use the DB vendors bulk-loading utilities. This can be invoked through some shell scripts/batch files to make things easier. This also makes more sense if you are not doing any additional processing with the data before upload.

I think the delay you found in your second attempt was probably because of commit operations happening more frequently than the earlier case. You need to look at the batch size in each case.

Howie
 
jQuery in Action, 2nd edition
 
subject: 1 Lakh( Hundred Thousand ) lines of data into database.