I am working with a Spring Batch program where I need to read large number of records and for each record either insert or update into a DB table. My problem is if I work with the Database directly (meaning for each input record if I insert or update) my performance is a drag. I have a JPA layer (hibernate implementation) on top of my DB. Is there any way I can put off persisting to DB till I am done with processing and work with in memory JPA entities.
Specifically I need to know when to do an update or insert, meaning for each input record I need to know did I create its corresponding output row record already or not. Instead of querying the DB can I check if the JPA entity was created in the Persistence Context. If so how exactly do I? Do I just write a query?
Is there any way I can put off persisting to DB till I am done with processing and work with in memory JPA entities.
regarding above question , you may choose not to do session.comit() till you wish
but having so much of data in your memory and session level cache will make JVM slow you may see lot of GC and or out of memory
What you seek is definitely possible, but it's kind of difficult to explain (at least for me), because it depends on your exact setup and conditions. Who is your JPA vendor (Hibernate)? Do you need transactions? What are the isolation levels? Do you use any caching? What are the jpa batch sizes? What is your primary key generation policy? etc....
One way I have done something like this is to have a database auto-generated pk policy and with that any time an entity is not persisted, it will not have an id, but as soon as it is, it will have an id. A call to entity.getId() "should" not generate a database query, so it is a quick way to tell if an entity you are holding in your persistence context has been persisted in the database.
Joined: Apr 30, 2002
Just in case anyone is interested, I was able to get around this issue by beginning and ending my transaction in my Spring Batch step rather than within the process method of my Processor...
subject: Working with Persistence Context instead of Database (Batch Processing)