File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Object Relational Mapping and the fly likes Batch fetching using JPA Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Databases » Object Relational Mapping
Bookmark "Batch fetching using JPA" Watch "Batch fetching using JPA" New topic
Author

Batch fetching using JPA

Amol Katyare
Ranch Hand

Joined: Apr 02, 2007
Posts: 36
How can I have batch fetching using JPA? I need to fetch 0.5 millions of records from DB.

Many thanks in advance,
Amol.


SCJP [1.4]
James Sutherland
Ranch Hand

Joined: Oct 01, 2007
Posts: 553
What do you mean by "batch fetching"?
Do you wish to page through the results? If so then this can be done in JPA through the Query setFirstResult() and setMaxRows() APIs.
If you are trying to fetch relationships you can use the "join fetch" option in your JPQL Query.
Some JPA implementations also support cursors, and batch reading of relationships, or object batching.


TopLink : EclipseLink : Book:Java Persistence : Blog:Java Persistence Performance
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17260
    
    6

.5 million records. Hmmm, which user do you have that has the time to look through all those records?

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Amol Katyare
Ranch Hand

Joined: Apr 02, 2007
Posts: 36
Thanks for your comments.

1. Yes, I am using query.setFirstResult and query.setMaxResults APIs but I wanted to know if it is advisable to use these pagination APIs when the same thing can be achieved using pure JDBC calls such as ScrollableResultSet? Does JPA have any performance issues when we fetch 0.5 millions of records in a batch consists of say 0.1 million records? If possible, could you please propose any better alternative instead of JPA?

2. There are no users who will go through these records. I am going to write them in a file which would be used by mainframe program as part of batch processing.

Thanks in anticipation,
Amol.
James Sutherland
Ranch Hand

Joined: Oct 01, 2007
Posts: 553
You could try JPA and see if it will meet your performance needs. I am often surprised at how much data can be processed through JPA.

You may need to base your code on a specific JPA implementation to ensure it is optimized, which JPA implementation are you using? You should ensure the query is read-only, non-transactional. If using the firstResult/maxRows does not give you the performance you are looking for you could use your JPA implementations specific query API to use a cursor (most support this, TopLink, EclipseLink, Hibernate). You could also logically partition the data by id ranges/etc.

If JPA isn't giving you good enough performance you may try raw JDBC, but this should not make a huge difference, you may need to look into raw database tools, such as your databases export functions.
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17260
    
    6

Originally posted by Amol Katyare:
Thanks for your comments.

2. There are no users who will go through these records. I am going to write them in a file which would be used by mainframe program as part of batch processing.

Thanks in anticipation,
Amol.


Which database are you using. You could do a completely different approach in which the extract is done through sh or bat script and directly with the database tools. If need be, you can fire the sh or bat from your Java program and continue from there.

This is a case where personally, I would use other techniques rather than Java or ORM.

Mark
Amol Katyare
Ranch Hand

Joined: Apr 02, 2007
Posts: 36
I tried out 2 approaches to extract half a million volume of data using 2 different approaches.

1. Using JPA: View is created to avoid multiple entities creation and view is mapped with single entity class.

2. Raw JDBC: SP is created to extract data. SP is called using callable statements of JBDC.

Database: Oracle.

The conclusion: The 2nd approach is much faster than the 1st one. May be, O-R mapping is a bottleneck here. But now, I need to explore another approach called "hibernate batch".

It would be helpful, if you could provide your suggestions or any more info on hibernate batch.


PS: The solution of using ETL tools to extract huge volume of data is obviously better solution but once the data is extracted we need to process each field using some rules which could be field specific. After the processing, data is dumped into a flat text file. Hence, this option is ruled out as Java could provide little more luxary to do this.

Thanks very much for posting reply.
Amol.
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17260
    
    6

Originally posted by Amol Katyare:

PS: The solution of using ETL tools ... we need to process each field using some rules which could be field specific.

Amol.


That is exactly what the ETL tools are for. To extract, Transform, and load. The transform is what can handle rules for field specific. In oracle .ctl files can do that. You can also use Oracle SP to do all the field specific functionality. Other ETL tools out there also handle the transforming part.

Java/Hibernate is not an ETL tool, and really shouldn't be used as such.

Mark
Cameron Wallace McKenzie
author and cow tipper
Saloon Keeper

Joined: Aug 26, 2006
Posts: 4968
    
    1

Indeed

i use ibms datastage to do exactly those types of things everyday> it"s easy and effective>

_cameron mckenzie
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Batch fetching using JPA