aspose file tools*
The moose likes Performance and the fly likes 1 Million of records insertion using JDBC Batch in Oracle Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "1 Million of records insertion using JDBC Batch in Oracle" Watch "1 Million of records insertion using JDBC Batch in Oracle" New topic
Author

1 Million of records insertion using JDBC Batch in Oracle

mozammil muzza
Ranch Hand

Joined: Dec 22, 2011
Posts: 32

Hi,

I am trying to run application that inserts 1 million of records into the DB table with 7 columns and with 1 PK, 1 FK and 3 Unique index constraints on it.

I am using PreparedStatement and JDBC Batch for this and on every 2000 batch size i runs executeBatch() method.

here, for half millions of records it is taking almost 3 mins i.e. 182 secs.

Please tell me can we have any other faster way of inserting the data into DB.

thanks


Khuda Haafiz Muzza 4 Java
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10451
    
    8

Hardly a beginners topic. Moving to a more appropriate forum


[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

mozammil muzza wrote:I am trying to run application that inserts 1 million of records into the DB table with 7 columns and with 1 PK, 1 FK and 3 Unique index constraints on it.
I am using PreparedStatement and JDBC Batch for this and on every 2000 batch size i runs executeBatch() method.
here, for half millions of records it is taking almost 3 mins i.e. 182 secs.

Well,

First: have you considered using a native DB load facility?

Second: Have you considered re-designing your table? 4 unique indexes (including, presumably, PK) on a table with 7 columns suggests to me that it isn't properly rationalized.

Third: I suggest you look at this thread and, in particular, Jayesh's response; because it's pretty much what I'd advise.

Winston

[Edit] Updated. Thanks Jeff.


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
mozammil muzza
Ranch Hand

Joined: Dec 22, 2011
Posts: 32

Second: Have you considered re-designing your table? 4 unique indexes (including, presumably, PK) on a table with 7 columns suggests to me that it isn't properly rationalized.

I tried the same in other DB table with same structure, there it is taking hardly 37 secs for 600K records, thats why i am bit confused on it.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

mozammil muzza wrote:I tried the same in other DB table with same structure, there it is taking hardly 37 secs for 600K records, thats why i am bit confused on it.

Me too, because you haven't given us any information that could help. Notwithstanding, my point stands: I've been dealing with databases for more than twenty years and can count on the fingers of one hand tables that rationally required 4 unique indexes; and none of them had only 7 columns.

But maybe you can prove me wrong.

Winston
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Winston Gutkowski wrote:
Third: I suggest you look at this thread and, in particular, Junilu's response;


Jayesh's response perhaps?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

I'd say Winston has already raised the main points (that is, to consider using a data load tool, in this case SQL*Loader or perhaps external tables), and to review the table and its indexes (index maintenance is quite expensive, as compared to the insert operation itself).

So it's just the minor points left:

1) Do some experiments to find which batch size works best in your environment. 2000 could even be too much.

2) Make sure you're using the latest JDBC driver for Oracle. Oracle has made some performance enhancements in JDBC driver batch processing which came with the 11g database version (looking just at the filename is not enough, various versions of the driver are named ojdbc5.jar or ojdbc6.jar).
mozammil muzza
Ranch Hand

Joined: Dec 22, 2011
Posts: 32

Jeff Verdegan wrote:
Winston Gutkowski wrote:
Third: I suggest you look at this thread and, in particular, Junilu's response;


Jayesh's response perhaps?


i do not have provision for using sqlloader in my apps. So that thread is not really much helpful.


Martin Vajsar wrote:I'd say Winston has already raised the main points (that is, to consider using a data load tool, in this case SQL*Loader or perhaps external tables), and to review the table and its indexes (index maintenance is quite expensive, as compared to the insert operation itself).

So it's just the minor points left:

1) Do some experiments to find which batch size works best in your environment. 2000 could even be too much.


I tested with several batch sizes, but did not achieved the level what i want. Even I tried Oracle Batching Mechanism over it, but not much improvement.

2) Make sure you're using the latest JDBC driver for Oracle. Oracle has made some performance enhancements in JDBC driver batch processing which came with the 11g database version (looking just at the filename is not enough, various versions of the driver are named ojdbc5.jar or ojdbc6.jar).


I am using latest jdbc driver 11.x.x and checked in its Manifest file BTW.


Any thing else can i try, thanks for your reply.
Carles Gasques
Ranch Hand

Joined: Apr 19, 2013
Posts: 199
    
    1
Hi,

Some path to explore is drop the index / indexes, insert all your data and then rebuild the index / indexes.
That will give you a boost.

Cheers,
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Carles Gasques wrote:Some path to explore is drop the index / indexes, insert all your data and then rebuild the index / indexes...

That occurred to me too; and it would definitely have been my advice if the indexes weren't unique - but one presumes they are unique for a reason (ie, probably as an automatic validity check).

If you drop, load and rebuild, you have two basic issues:
1. Values aren't verified on insertion.
2. Rejected index values are not likely to be in any predictable order (particularly as there are 4 indexes to satisfy).

My advice is still to look at that table's definition: it sounds "smelly" to me.

Winston
Carles Gasques
Ranch Hand

Joined: Apr 19, 2013
Posts: 199
    
    1
Hi,

I agree with your diagnostic Winston.

But at least giving a try to rebuild the indexes before load the data and check performance evolution will not do any harm.


Cheers,
mozammil muzza
Ranch Hand

Joined: Dec 22, 2011
Posts: 32

Winston Gutkowski wrote:
Carles Gasques wrote:Some path to explore is drop the index / indexes, insert all your data and then rebuild the index / indexes...

That occurred to me too; and it would definitely have been my advice if the indexes weren't unique - but one presumes they are unique for a reason (ie, probably as an automatic validity check).

If you drop, load and rebuild, you have two basic issues:
1. Values aren't verified on insertion.
2. Rejected index values are not likely to be in any predictable order (particularly as there are 4 indexes to satisfy).

My advice is still to look at that table's definition: it sounds "smelly" to me.

Winston


Well, indexes are unique one and i can't drop it before inserting records as they are dependent on it. It makes the data inconsistent.

Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

mozammil muzza wrote:Well, indexes are unique one and i can't drop it before inserting records as they are dependent on it. It makes the data inconsistent.

Unfortunately it looks like you've got conflicting requirements. If you do need to improve the insertion speeds, you may have to make compromises elsewhere.

Perhaps, if your database has enough resources you could use, you might insert the rows into a temporary (unindexed) table and then merge/inset them into the target table in parallel.

Since you've ruled out the obvious measures, it would be good to find out what the bottleneck of your current processing is and target that bottleneck specifically. Do you have some experience with performance tuning in Oracle?
Carles Gasques
Ranch Hand

Joined: Apr 19, 2013
Posts: 199
    
    1
I agree of course,
but give a try to rebuild the indexes and then run the load code again.
Perhaps the indexes are too much fragmented and unbalanced, rebuild them could make a difference.


Best reggards,
mozammil muzza
Ranch Hand

Joined: Dec 22, 2011
Posts: 32

Martin Vajsar wrote:

Perhaps, if your database has enough resources you could use, you might insert the rows into a temporary (unindexed) table and then merge/inset them into the target table in parallel.

I tried load insert as well using stage table, still not much improvement.
can you please elaborate about merge/insert and its processing in parallel, not sure, but guess it might be helpful.


Since you've ruled out the obvious measures, it would be good to find out what the bottleneck of your current processing is and target that bottleneck specifically. Do you have some experience with performance tuning in Oracle?

Not really.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

mozammil muzza wrote:I tried load insert as well using stage table, still not much improvement.
can you please elaborate about merge/insert and its processing in parallel, not sure, but guess it might be helpful.

Employing parallel processing makes sense only if your database has plenty of available CPU and IO resources during your loads. If your parallel processing competes with other processes running in the database, it will only make life miserable for everyone (including you). Also, you might need some special privileges and/or quotas to be able to run parallel jobs, ask your DBA if in doubt (your DBA, assuming there is one, should know what you're going to do in the database. Parallel processing is usually not expected from a JDBC application.)

What I'd try:

1) Create a temporary table (probably with the on commit preserve rows option). Temporary tables generate slightly less undo, which is good, of course. Let me note - in case you haven't worked with Oracle's temporary tables yet - that the table should be created upfront and be part of your database schema, you don't create it as part of your data loading process. No indexes.

2) Before populating the table, truncate it to make sure it doesn't contain rows from previous runs (should be an instantaneous operation).

3) Populate the table using batching (no commits in between, of course), and using the APPEND_VALUES hint. Commit after the table is populated (it is necessary because of the APPEND_VALUES hint). Disclaimer: I'm not sure the APPEND_VALUES hint will have any effect when inserting into a temporary table, but personally I'd just try it with and without, and see what is faster.

Now, if the steps 2 and 3 don't take significantly less time than your current processing, it is not worth going further. Just halving the time might not be enough, in my opinion. We'll still have to copy all of the data into the target table, and that's gonna take some time even in parallel.

4) Insert data into target table (single insert into ... select statement), employing parallel processing. See this StackOverflow article for details, I haven't ever used parallel insert myself.

If there is a unique index violation, the insert will fail, and you won't have any indication which rows/values are the cause. But at least you'll have the option to rollback the transaction, and/or identify the duplicate values by querying the target and temporary table.
jake lopez
Greenhorn

Joined: Jul 03, 2013
Posts: 13
make your jdbc connection not autocommit. and commit every batches.
Arun Giridhar
Ranch Hand

Joined: Mar 10, 2012
Posts: 147

This post might be really helpful to my current open source prj . I will be using Postgres , i'm dealing with 3 Lakh records per 6 months .


hate Professionalism . Join the http://2014.hack.lu/index.php/Main_Page
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: 1 Million of records insertion using JDBC Batch in Oracle