• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

JDBC batch update in thread

 
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Currently I am updating 100000 records to DB using jdbcTemplate bacth update. Each batch consists of 10000 records.
So i have to execute 10 batches in total. Is it a good practice to start each batch as a seperate thread?. Single JDBCTemplate instance
will be shared by all threds.


 
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Couple of points

a. List<String> splittedRecords = splitRecords(fullRecords); - should be outside the loop. you don't need to run it 10 times.

b. JDBC template is thread safe once instantiated.

Instances of the JdbcTemplate class are threadsafe once configured. This is important because it means that you can configure a single instance of a JdbcTemplate and then safely inject this shared reference into multiple DAOs (or repositories). The JdbcTemplate is stateful, in that it maintains a reference to a DataSource, but this state is not conversational state.



However executing so many database updates might have other problems that you need to consider. What happens when one of those 10 sets fails to update? When do you want to commit the records?
Doing so many inserts in parallel could also cause performance degradation v/s a gain in case you DB has to checks before inserting those records (say auto generated primary key). I'm not an expert on the DB side - so this a wild guess - but you should do some extensive testing.

Would I recommend it? Probably not - unless you have a good reason to spawn multiple threads
 
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note that the threads will not be able to share a transaction. Each thread will have tok open it's own connection to db and each connection will have its own transaction.
 
Renjith Panikar
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

a. List<String> splittedRecords = splitRecords(fullRecords); - should be outside the loop. you don't need to run it 10 times.



Sorry for the wrong naming convention. By 'splitRecords', i just mean a method that returns the sub List required for that particular iteration. Not a method that splits the 'fullRecords' on each iteration.

b. JDBC template is thread safe once instantiated.



You are right. That is one reason i kept the JdbcTemplate creation outside of all threads. But seems like the point given by Jayesh contradicts with this.

Note that the threads will not be able to share a transaction. Each thread will have tok open it's own connection to db and each connection will have its own transaction



@Jayesh By opening so much connection, is it possible that it can consume most of the connections in the DB2 connection pool?

@Saifuddin What's your thought? if we use a single JdbcTemplate, i think it will take the same amount of time as that of executing single batch at a time.

Thanks a lot for your valuable inputs. Most of them are valid. Some of them were in my mind when we decided to experiment with this approach.

 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

b. JDBC template is thread safe once instantiated.

You are right. That is one reason i kept the JdbcTemplate creation outside of all threads. But seems like the point given by Jayesh contradicts with this.


The JDBC template is thread safe. As to the underlying driver and how transactions get managed is an other problem. What Jayesh is saying that by opening multiple threads you cannot "commit" all 100,000 transactions. The "commit" will happen for each 10 pieces of 10,000 records. That means if say 9/10 "chunks" are successful - you could end up with 90,000 records in the database. Depending on the business scenario this may or may not be acceptable.

Note that the threads will not be able to share a transaction. Each thread will have tok open it's own connection to db and each connection will have its own transaction

@Jayesh By opening so much connection, is it possible that it can consume most of the connections in the DB2 connection pool?

@Saifuddin What's your thought? if we use a single JdbcTemplate, i think it will take the same amount of time as that of executing single batch at a time.



I would suggest that you try this out and see how it performs. 10 threads should not consume all connections on DB2 pool, but I don't think you will gain a lot in performance (depends on you split method too).
Are you looking at this design purely for a performance reason? Do you really need the gained performance (many batch systems can work with slightly delayed times) - 100,000 is not very big number of records.
 
Jayesh A Lalwani
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Each thread will open a connection to the database. So, if you have 10 threads, there will be 10 connections. If you have 100 threads, you will have 100 connections. I wouldn't worry about number of connections to database though. Your bottleneck is threads, not connections. Generally speaking, as you start more threads, performance doesn't scale linearly. So, if 1 thread can import 10k records in 2 minutes, 2 threads will not import them in 1 minute. They might take few seconds more. This is because there is overhead of managing the threads, and also there mig be additional IO to the disk and network. At some point you will reach a point where adding more threads doesn't increase performance, and if you go beyond that you will performance decreasing. Not to mention that if you have more threads, you should give more memory to the JVM so GC doesn't go crazy. Selection of optimum number of threads is a complicated topic, and you really need to do a lot of measurement on the hardware to select the maximum threads. Also, you have to consider the load on database. Data import is an IO intensive operation for the database. You need to measure how much the database can take. Also, if you have other things going on against the database at the same time, you will need to worry about how the other operations will be affected. Number of connections is the least of your problems. You should be worrying about IO bottle necking.


If this import job is running sporadically, like in a maintanence window or when there is low load on the servers, then just 10 connections won't kill your system. However, if this is going on throughout the day, you will have to figure out some sort of queuing mechanism, so you don't start 100-1000 threads at the same time.

Have you considered implementing this in spring batch? Spring batch does the threading and queuing for you, and all you do is worry about your business logic
 
Renjith Panikar
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
100000, is just a sample digit of the POC i am working on.
Actually it can go up to 4 Million. Each row will be having 30 fields.
Apologies, i dint communicated it properly.
 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
+1 on using a batch framework (spring batch) to do the plumbing task of multi threading while you focus on the business logic
 
Renjith Panikar
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Jayesh & Saifuddin.
That's lot of inputs for me. I started reading spring batch and left it halfway seeking for alternate technology, as i am not much familiar with Spring batch.
Now let me go back and see how spring batch can help me.
 
Renjith Panikar
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One more question.

What about using Spring's 'ThreadPoolTaskExecutor', there we can handle the Queue size?
Feels like it can handle the number of threads and there by restricting the memory usage. But database commit failure should be handled manually.
What's your thought?
 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Renjith Panikar wrote:One more question.

What about using Spring's 'ThreadPoolTaskExecutor', there we can handle the Queue size?



You could use any Thread Pool Executors - those provided by Spring or those provided by the java.util.concurrent framework. Spring Batch will buy you more than threading - batch re-startability, batch steps, execution tracking etc. Again depends on your use case. If you have no need for any of the "batch features" provided by a framework - plain vanilla java muli threading code (using ThreadPoolExecutor) should work fine.
 
Renjith Panikar
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Our use case is simple. Split the records, update using multiple threads.
My thought was like, the commit will happen only after executing all the batch statements.
Since it is not so, some thing has to be done to rollback the update.
Another issue was the memory used by threads. If it can be controlled by ThreadPool, I am good.
Do you think it requires Spring Bacth implementation?
 
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If the time for a single thread to do this is t1 and by mutile threads is t1+t2+t3...then it is not necessary that t1+t2+t3... would be less.infact it will be more because it will also add overhead of creating threads and
managing those.create multiple threads are required only when the individual units of work are independent of one another and you want those to run them parallel.
 
Renjith Panikar
Ranch Hand
Posts: 31
Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Suraj Thanks for the reponse.Each bacth is independent of each other, as the execution order is not required. But it is denpendent in other way that all batches should execute successfully.
Did you mean to say that it is not advisable to execute bacthes in threads?

Or I have to go with Spring Batch?. If so what makes Spring Batch diffrent from ThreadPoolExecuter in Creating & Managing threads?
As mentioned in Saifuddin's post if 'batch features' are the only reason, will it be a good move if i go with 'ThreadPoolExecuter' ?
 
Jayesh A Lalwani
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Spring batch is built on top of ThreadPoolExecutor. It makes it was easier for you to do error handling, monitoring and restarting, although there is a learning curve with Spring Batch. If you don't need those things, you can go with TPE.
 
reply
    Bookmark Topic Watch Topic
  • New Topic