Whats you commit-interval? Why don't you post you config?
It shouldn't take so much time, I've seen batches which write close to a million records to the database in less than 30 mins. Read operation should be a lot faster.
You can implement your own ItemReader, and just use the JdbcTemplate and call one of the queryFor methods. If it is a int returned call queryForInt, if it is a String you have to call queryForString.
If you do that then you don't need the rowMapper. While I doubt you will see a great increase in performance, it still is very easy to quickly write.
Mustafa, as for as commitInterval. The way the Spring Batch works is called chunk processing. In the Read and process of a step it done as Item by Item, then the write portion gets a Collection where the collection has the number in the commit interval/chunk size. If I set that to 1000, it will read 1000 rows one by one, then pass each row to the RowMapper, after each read/process that Item is placed into the Collection, when that collection gets 1000 items, then the collection is sent to the Writer. The Writer writes the 1000 in a batch, then commits the transaction. Then we do the reading/processing again for the next 1000 Items. The reader does some caching of data until the Transaction is commited. So in your Oracle batch read size of 10, would mean that the reader will contact the database 100 times per chunk. Since the commit interval here is set to 1000. I would make the chunk size and batch read size in the database to be the same number. 1000 will be good, 10000 will also be fine too.
The reader is buffered and most likely the number of records read from the database in one go, would be independent of the commit count. It would depend on the Driver, Resultset setting and other JDBC related setting.
So why does the commit interval of one take longer time? There are two reasons
a) Like Mark explained below spring follows a chunk oriented approach. So at the end of every commit interval - it would save the resource being written too (save the file, commit the database). This take some time.
b) Spring also saves meta-data information at the end of every commit interval, namely the STEP_EXECUTION_CONTEXT and the Number of Rows read and processed. It needs to do so for re-start-ability of the batch job. This takes the major chunk of time in your case.
Commit intervals up to 1,000 is safe. Anything over, you hardly notice any performance gains. Spring folks recommend a commit interval in the range of 100.
Maybe you can try if the sql query is optimum. Try execute it in toad and see how long it takes.
Some sql commands like 'order by' and much worse functions, can totally make your query crawl.
Joined: Nov 05, 2008
Thanks everyone. The posts have really helped me understand batch better.
I got commitInterval added to the code.
On a little more research of our tomcat's memory settings (1 GB max) and the fact that this process runs every night at 2:00 AM which means less load on server, i deduced that i did not need to worry much about high value of commitInterval overflowing the memory and set the value to 10000.
The process completed in 44 mins.
Should I lower this value? (I am sorry I didn't read Sam's latest post carefully about the ideal value of commitInterval).
Mustafa Garhi wrote:For one million records i just found difference of a few seconds between 1000 and 10,000.
We decided to stick to 10,000.
CommitInterval has done the trick.
Thanks everybody !!
Cool. Also consider this, if it has any affect. If a step fails and you want to restart. It will restart on the last chunk that caused the job to fail and have to do the entire chunk. So that would be 10K or 1000 depending on your chunk size. So there is a trade of to higher or lower. Just pick the happy medium for success completion time with restarts.