• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
  • Tim Holloway
  • Carey Brown
  • salvin francis

How to improve Spring batch performance working on 8 million records.

Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I am trying to generate reports in my project using spring batch. I have more than 8 million records in my database. Earlier I have set commit-interval as 1 but after reading some articles, I have set commit-interval as 10000 and page-size as 10000 but still, it's taking more than 44 hours to generate the report. In every itertaion, It is taking 3 to 4 minutes to get the records, processes those records and writing in my CSV file.

Please help me, friends. maybe I am doing something wrong.


<job id="reportJob" xmlns="http://www.springframework.org/schema/batch">

<!-- master step, 10 threads (grid-size) -->
<step id="masterStep">
<partition step="slave" partitioner="rangePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
<next on="*" to="combineStep"/>

<step id="combineStep">
<chunk reader="multiResourceReader" writer="combineFlatFileItemWriter"
commit-interval="1" />
<next on="*" to="deleteFiles"/>

<step id="deleteFiles">
<tasklet ref="debitfileDeletingTasklet" />

<!-- each thread will run this job, with different stepExecutionContext
values. -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="10000" />

<bean id="rangePartitioner" class="com.test.RangePartitioner" />

<bean id="debittaskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
   <property name="corePoolSize" value="10" />
   <property name="maxPoolSize" value="10" />
   <property name="allowCoreThreadTimeOut" value="true" />

itemreader bean

<bean id="pagingItemReader"
<property name="dataSource" ref="gemsDataSource" />
<property name="queryProvider">
<property name="dataSource" ref="gemsDataSource" />
<property name="selectClause" value="SELECT * " />
<property name="fromClause" value="***QUERY****/>
<property name="whereClause" value="where rn between :fromId and :toId" />
<property name="sortKey" value="rn" />
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
<entry key="toId" value="#{stepExecutionContext[toId]}" />
<property name="pageSize" value="10000" />
<property name="rowMapper">
<bean class="com.hello.ItemRowMapper" />

Please let me know if there is any issue with my code.
Posts: 1166
IBM DB2 Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would suggest you to analyze each phase - fetching the data, process the data, and writing out the CSV file separately. First of all, you need to discover the bottleneck. Moreover, you should provide more details (if you can) of what your code does, otherwise we can only guess in abstract.
Everybody's invited. Even this tiny ad:
Java file APIs (DOC, XLS, PDF, and many more)
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!