Win a copy of Transfer Learning for Natural Language Processing (MEAP) this week in the Artificial Intelligence and Machine Learning forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Paul Clapham
  • Devaka Cooray
  • Bear Bibeault
Sheriffs:
  • Junilu Lacar
  • Knute Snortum
  • Liutauras Vilda
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Piet Souris
Bartenders:
  • salvin francis
  • Carey Brown
  • Frits Walraven

How to improve Spring batch performance working on 8 million records.

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I am trying to generate reports in my project using spring batch. I have more than 8 million records in my database. Earlier I have set commit-interval as 1 but after reading some articles, I have set commit-interval as 10000 and page-size as 10000 but still, it's taking more than 44 hours to generate the report. In every itertaion, It is taking 3 to 4 minutes to get the records, processes those records and writing in my CSV file.

Please help me, friends. maybe I am doing something wrong.

Jobs-Context.xml

<job id="reportJob" xmlns="http://www.springframework.org/schema/batch">;

<!-- master step, 10 threads (grid-size) -->
<step id="masterStep">
<partition step="slave" partitioner="rangePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
<next on="*" to="combineStep"/>
</step>

<step id="combineStep">
<tasklet>
<chunk reader="multiResourceReader" writer="combineFlatFileItemWriter"
commit-interval="1" />
</tasklet>
<next on="*" to="deleteFiles"/>
</step>

<step id="deleteFiles">
<tasklet ref="debitfileDeletingTasklet" />
</step>
</job>

<!-- each thread will run this job, with different stepExecutionContext
values. -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">;
 <tasklet>
<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="10000" />
</tasklet>
</step>

<bean id="rangePartitioner" class="com.test.RangePartitioner" />

<bean id="debittaskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
   <property name="corePoolSize" value="10" />
   <property name="maxPoolSize" value="10" />
   <property name="allowCoreThreadTimeOut" value="true" />
 </bean>


itemreader bean


<bean id="pagingItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader"
scope="step">
<property name="dataSource" ref="gemsDataSource" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="gemsDataSource" />
<property name="selectClause" value="SELECT * " />
<property name="fromClause" value="***QUERY****/>
<property name="whereClause" value="where rn between :fromId and :toId" />
<property name="sortKey" value="rn" />
</bean>
</property>
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<map>
<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
<entry key="toId" value="#{stepExecutionContext[toId]}" />
</map>
</property>
<property name="pageSize" value="10000" />
<property name="rowMapper">
<bean class="com.hello.ItemRowMapper" />
</property>
</bean>

Please let me know if there is any issue with my code.
 
Bartender
Posts: 1259
39
IBM DB2 Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would suggest you to analyze each phase - fetching the data, process the data, and writing out the CSV file separately. First of all, you need to discover the bottleneck. Moreover, you should provide more details (if you can) of what your code does, otherwise we can only guess in abstract.
 
To do a great right, do a little wrong - shakepeare. twisted little ad:
Two software engineers solve most of the world's problems in one K&R sized book
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    Bookmark Topic Watch Topic
  • New Topic