File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

1 Lakh( Hundred Thousand ) lines of data into database.

 
Sathish Ramadas
Greenhorn
Posts: 27
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all

Requirement : Read data from a flat file and insert into database.

The total number of rows in the data file are around 1 lakh ( each line contains 700 characters). We need to dump those data into databse.

Method:
Read the data using buffered input stream and store it in arraylist and insert into database. Since we cannot directly store all 1 lakh lines of data into arraylist at the same time, i am doing it block by block. Storing 25000 lines at a time and insert into database then clear arraylist then again start the process.

This way it takes 12 min to insert 1 lakh record into database.

I am not very sure how much time will it take really?

please suggest me.
[ January 04, 2006: Message edited by: Ajith Kallambella ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you using the SQL Statement addBatch() to batch up a bunch of inserts? You should be able to read a line and add a SQL statement without ever putting the lines into a collection like an ArrayList. Add, execute and commit some number of lines at a time. Give yourself a command line argument or configuration option to set the number of lines you read between commits.
 
Jeff Albertson
Ranch Hand
Posts: 1780
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What's a lakh?
 
Paul Clapham
Sheriff
Pie
Posts: 20166
24
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Google is your friend, Jeff. The first link it returns is to the Wikipedia article.
 
Jeff Albertson
Ranch Hand
Posts: 1780
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yaar, I was just kidding. I always measure throughput in crore per kalpa.
 
Paul Clapham
Sheriff
Pie
Posts: 20166
24
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, you didn't really have a doubt?
 
Jeff Albertson
Ranch Hand
Posts: 1780
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
...that he hasn't jumped into this thread is clear evidence that "fstream" doesn't read these forums...
 
Sathish Ramadas
Greenhorn
Posts: 4
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your suggestions...

Lakh Lines -> Hundred Thousand Lines.

When i looked at the executions, i don't see any delay in storing data into the collections classes and retreiving it.

Let me check it without using collection classes and just directly inserting the data into database.

Thanks.
 
Sathish Ramadas
Greenhorn
Posts: 4
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I removed the collection classes and tried a direct insertion of data into database.
Also i noticed database insert timing alone. I calculated start time before database action starts and ends. It took more time than when i used collection classes. For hundred thousand records it's taking 14 mins.

when i test database insertion alone, i am not reading the data from the file and inserting it...just a database insertion (hard coded value in the statment values) alone takes more time.

When I used to read data from flat file, I use to store it in collection classes (arraylist) and insert into database. In the mean time, when the ararylist size reaches 15000, I will clear it and start the iteration again. So database conenction also get refreshed during this time.

Is it because of single database connection inserting hundred thousand records takes me more time (This test is a direct database insertion wihtout using collection classes).


PLease suggest me.
 
Paul Clapham
Sheriff
Pie
Posts: 20166
24
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Where you store the data in memory on the client side will have very little effect. From the client's point of view, reading the records and writing them directly to the database should be the most effective. Anything else (such as storing them in collections) is overhead. The database doesn't care that you did all that.

How often are you committing these changes to the database? Are you using batches? Those are the relevant questions.
 
Deepak Shankar
Greenhorn
Posts: 12
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hundred thousand records! It is interesting indeed...

For the database inserts are you using Stored Procedure? If not try it with Stored Procedure (making callable statements instead of Statement/Prepared Statement). It should make it much faster than the normal insert. However you may want to check the buffer limit in the database (since hundred thousand is huge).
 
Mahadevan Gorti SS
Greenhorn
Posts: 18
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sathish,

Time taken depends upon various parameters like:
-- What is DB(MySql,Postgres,Oracle,HSQLDB,ClouseScape,etc)
-- What type of JDBC driver you are using ?
-- What type of file system and related question (for HDD related problems)
-- What is OS/Memory/CPU-Load while you are testing ?
-- Are you using PrepareStatement & addBatch
-- Did you set autocommit off before processing the commit

regards

mahadevan

PS: I got 4500 records/sec in MySQL(for record size of 200 bytes)
I got 6000 records/sec in PostgreSQL(for record size of 200 bytes)
I got 36000 records/sec in HSQLDB(for record size of 200 bytes)
my taking care of above parameters
 
Reid M. Pinchback
Ranch Hand
Posts: 775
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Deepak Shankar:
For the database inserts are you using Stored Procedure? If not try it with Stored Procedure (making callable statements instead of Statement/Prepared Statement). It should make it much faster than the normal insert. However you may want to check the buffer limit in the database (since hundred thousand is huge).


This sounds interesting. Why would using a callable statement+stored procedure be faster than a prepared statement? If anything I would have expected there to be additional overhead for pulling in the interpreter for the database vendor's more-than-SQL language processor.

Different question for Sathish - is there a particular vendor's database you are using? You may have additional options depending which vendor it is.
 
Adam Richards
Ranch Hand
Posts: 135
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For maximum performance, consider your DBMS' bulk loader utility (e.g., SQLLDR in Oracle). They temporarily disable index updates & commits, which speeds things up dramatically. These utilities typically have a throughput of several hundred thousand rows per minute.
 
Appu Chan
Greenhorn
Posts: 28
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is it a hard requirement that you must do it through a Java program? Else its is always better to use the DB vendors bulk-loading utilities. This can be invoked through some shell scripts/batch files to make things easier. This also makes more sense if you are not doing any additional processing with the data before upload.

I think the delay you found in your second attempt was probably because of commit operations happening more frequently than the earlier case. You need to look at the batch size in each case.

Howie
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic