Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Batch fetching using JPA

 
Amol Katyare
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How can I have batch fetching using JPA? I need to fetch 0.5 millions of records from DB.

Many thanks in advance,
Amol.
 
James Sutherland
Ranch Hand
Posts: 553
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What do you mean by "batch fetching"?
Do you wish to page through the results? If so then this can be done in JPA through the Query setFirstResult() and setMaxRows() APIs.
If you are trying to fetch relationships you can use the "join fetch" option in your JPQL Query.
Some JPA implementations also support cursors, and batch reading of relationships, or object batching.
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
.5 million records. Hmmm, which user do you have that has the time to look through all those records?

Mark
 
Amol Katyare
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your comments.

1. Yes, I am using query.setFirstResult and query.setMaxResults APIs but I wanted to know if it is advisable to use these pagination APIs when the same thing can be achieved using pure JDBC calls such as ScrollableResultSet? Does JPA have any performance issues when we fetch 0.5 millions of records in a batch consists of say 0.1 million records? If possible, could you please propose any better alternative instead of JPA?

2. There are no users who will go through these records. I am going to write them in a file which would be used by mainframe program as part of batch processing.

Thanks in anticipation,
Amol.
 
James Sutherland
Ranch Hand
Posts: 553
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could try JPA and see if it will meet your performance needs. I am often surprised at how much data can be processed through JPA.

You may need to base your code on a specific JPA implementation to ensure it is optimized, which JPA implementation are you using? You should ensure the query is read-only, non-transactional. If using the firstResult/maxRows does not give you the performance you are looking for you could use your JPA implementations specific query API to use a cursor (most support this, TopLink, EclipseLink, Hibernate). You could also logically partition the data by id ranges/etc.

If JPA isn't giving you good enough performance you may try raw JDBC, but this should not make a huge difference, you may need to look into raw database tools, such as your databases export functions.
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Amol Katyare:
Thanks for your comments.

2. There are no users who will go through these records. I am going to write them in a file which would be used by mainframe program as part of batch processing.

Thanks in anticipation,
Amol.


Which database are you using. You could do a completely different approach in which the extract is done through sh or bat script and directly with the database tools. If need be, you can fire the sh or bat from your Java program and continue from there.

This is a case where personally, I would use other techniques rather than Java or ORM.

Mark
 
Amol Katyare
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried out 2 approaches to extract half a million volume of data using 2 different approaches.

1. Using JPA: View is created to avoid multiple entities creation and view is mapped with single entity class.

2. Raw JDBC: SP is created to extract data. SP is called using callable statements of JBDC.

Database: Oracle.

The conclusion: The 2nd approach is much faster than the 1st one. May be, O-R mapping is a bottleneck here. But now, I need to explore another approach called "hibernate batch".

It would be helpful, if you could provide your suggestions or any more info on hibernate batch.


PS: The solution of using ETL tools to extract huge volume of data is obviously better solution but once the data is extracted we need to process each field using some rules which could be field specific. After the processing, data is dumped into a flat text file. Hence, this option is ruled out as Java could provide little more luxary to do this.

Thanks very much for posting reply.
Amol.
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Amol Katyare:

PS: The solution of using ETL tools ... we need to process each field using some rules which could be field specific.

Amol.


That is exactly what the ETL tools are for. To extract, Transform, and load. The transform is what can handle rules for field specific. In oracle .ctl files can do that. You can also use Oracle SP to do all the field specific functionality. Other ETL tools out there also handle the transforming part.

Java/Hibernate is not an ETL tool, and really shouldn't be used as such.

Mark
 
Cameron Wallace McKenzie
author and cow tipper
Saloon Keeper
Posts: 4968
1
Hibernate Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Indeed

i use ibms datastage to do exactly those types of things everyday> it"s easy and effective>

_cameron mckenzie
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic