aspose file tools*
The moose likes Object Relational Mapping and the fly likes Hibernate performance issues using huge databases Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Databases » Object Relational Mapping
Bookmark "Hibernate performance issues using huge databases" Watch "Hibernate performance issues using huge databases" New topic
Author

Hibernate performance issues using huge databases

Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
Hibernate sucks. Its performance is so damn slow when accessing, migrating large databases. We just tested our application on the actual database only to find that it is taking anywhere between 20-30 minutes to just run a simple query such as a join on a couple of tables with each table having about 100,000 records. Its too late and we are still unable to decide on moving back to the same plain JDBC type of queries which return results and are processed also in less than a minute.

I am still wondering if there was a solution to our problem. I don't mind posting all of the example code if there is somebody out there who can help me in any ways to not to avoid hibernate.

A query such as follows takes so much time to run and processing the list is even worse...

String qry = "select new db.newObjConstructor(mainObj1.field1, mainObj1.field2,.... mainObj1.childObj1.childObjField5, mainObj.childObj2.subChildObj.Field6)
FROM ParentObj mainObj, ChildObj childObj1, ChildObj childObj2
WHERE mainObj.PKField= childObj1.FKField and mainObj.PKField= childObj1.FKField
ORDER by ...";

Query q = session.createQuery(qry);
return q.list();

Some other things that we did were:
-add indices to those fields which are used to search for in a column
-Used simple straight forward mapping techniques, mapping each column with each field of the object and used lazy="false" whenever child objects were also required (almost always required) in the query and included the child objects as parameters in the object files.
-Create new objects with those fields of interest and created using their their constructors so that we can stick to only one DAO rather than calling other child DAO's while iterating the list of parent objects.
-Optimised the queries to get the data by a particular order so that we can skip other elements (rather a huge list) rather quickly by a primary key and then by a particular column and so on...

Are we missing something here? Are there any other important hibernate properties or reasons why hibernate is so dead slow? Suggestions to optimize, correct or comment welcome...
[ February 18, 2008: Message edited by: Rama Krishna ]
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

OK, so there are lots of places that you could tweak that will definitely improve your performance.

If you run that query, how many records are returned? Remember Hibernate will create an object for each and everyone and store it in its persistence context. If you create too many objects, it uses up a lot of memory, which might be slowing you down.
Do you need all those records at once?
If you get a lot of records, are you flushing and evict some out so you save memory?
If these records are getting shown to a User, and it is more than a "page" of records, then you don't need all those records because no one user can really take in more than a page of data.
If you are doing bulk operations of any kind, including transferring 100s of thousand of records, then in those cases, evicting and flushing at the JDBC batch size is imperative, or for those type of operations consider just JDBC, or better yet some Database tool, for instance SQL*Loader in Oracle, etc.

There are lot of reasons why you could be seeing such horrible performance, and just thinking any ORM tool out of the box without setting it up correctly will change your results.

I highly recommend getting "Java Persistence With Hibernate" from Manning Press and reading it to find out how to assist you.

Good luck

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

To add to what Mark has already written, what you've tried in the way of optimisations look a little random. You could be wasting your time, or in the case of creating new indices, making things worse.

After you've checked your Hibernate configuration, unless you see a marked improvement, I'd profile you app. No sense in guessing changes when there are tools that will tell you exactly where the hotspots are.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
Paul,
I would not have posted my problem had I known a solution myself. I am looking for help and help only please. We are using hibernate for the first time and I personally donot want it to be the last time we use it. And, we never went randomly but only explored the the basic properties (read my first post again) and are now open to explore further. Only let me know if you have any suggestions to improve the performance or if we are wrong anywhere or in these lines.. What do you mean by " No sense in guessing changes when there are tools that will tell you exactly where the hotspots are?"

Mark,
Thanks for your quick reply. I thought exactly in your lines
""If you get a lot of records, are you flushing and evict some out so you save memory?
If these records are getting shown to a User, and it is more than a "page" of records, then you don't need all those""

We already decided to move to JDBC for migrating huge data (>100,000 records) inclusive of joins on many tables.

Regarding displaying large datasets, I added one more variable to get a track the records being displayed and by setting .setMaxResults(500) each time I get only a couple of records. Previously we got the complete 100,000 records at one stretch to be sent to the jsp display table. Not sure how much memory I am saving by not creating so all the objects, but yes, this was helpful in terms of memory and a little improvement in the performance. But, it was getting complicated as we are using display tag libraries and there is a chance the user changes the filter or sort by details in the table being displayed (couple of 100 records) and so this requires a complete change in the query again. There must be a solution to this problem as Oracle does not seem to support scrollable result sets or a simple way to get next 100, previous 100 as fast as possible.

As discussed in my post above, we created objects and wrote constructors to create list of these objects from the different fields of different tables.

String qry = "select new db.newObjConstructor(mainObj1.field1, mainObj1.field2,.... mainObj1.childObj1.childObjField5, mainObj.childObj2.subChildObj.Field6)
FROM ParentObj mainObj, ChildObj childObj1, ChildObj childObj2
WHERE mainObj.PKField= childObj1.FKField and mainObj.PKField= childObj1.FKField
ORDER by ...";

Query q = session.createQuery(qry);
return q.list();

Child objects were included along with other attributes or fields in each of the parent object class, so whenever we access the parentObj.childObject.childObjectAttribute properties to create objects of the new types that we need, it becomes slow although lazy="false" has eliminated the select queries to get child object properties lazily.

Any other suggestions or places to look for that I am missing?
Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
Paul, Mark I was looking for response from you.

Anyways, I would really appreciate if any body else who can suggest me something after reading my posts or else I have to post it somewhere else and get help.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

Originally posted by Rama Krishna:
I am looking for help and help only please.
This makes it kind of hard for me to post a suggestion, since I don't know for sure that it will help. So you should be more careful about how you say things.

Anyway, I'm going to assume you didn't mean to freeze the discussion. I used Hibernate to migrate a database and I wouldn't do it again. Like you say, it took far too long and a lower-level plain SQL approach would have been faster. But on the other hand it gave me the chance to experiment with Hibernate, because I hadn't used it before. And we do have Hibernate working with the result of the migration and I have no complaints about its speed.

Normally in web applications you aren't bringing in large numbers of records into your application. That's just the nature of web applications, and that's why Hibernate fits well there. If you're doing something else which requires handling large volumes of data, perhaps Hibernate might not be a good choice either. But I don't have experience there, and perhaps I'm wrong.
or else I have to post it somewhere else and get help

This is rather like blackmail, too. First you freeze the discussion and then you complain because you aren't getting any discussion. As I said, please be more careful how you say things, and I'll assume you didn't mean it that way. I would second everything that Mark and Paul already said.
[ February 21, 2008: Message edited by: Paul Clapham ]
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

If you are actually migrating a full database from one vendor to another, I would not use Hibernate or Java at all for the migration.

If you are migrating into Oracle, I would use SQL*Loader or SQL*Plus and scripts.

In any application that a user sees data in a UI, there is absolutely no reason to get hundreds of thousands of records because no person could read it in one page.

If I take a hammer and try to screw in a lightbulb with it, I would wonder why the hammer sucks.

Mark
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Rama, my suggestions were intended to hopefully help, I'm not just making glib comments to annoy. You are free to ignore them if you don't feel they are any use.


What do you mean by " No sense in guessing changes when there are tools that will tell you exactly where the hotspots are?"

I mean that database profiling tools will tell you if indices and other tuning are needed, and give you a good view of what your application is doing. If you application is just for data migration and you have added indices to aid that process then you will have very probably impared your database's performance when its being used for normal transactional operations. Indices are not a panacea to database performance problems.

Assuming changes to Hibernate configuration and how you use Hibernate (based on Mark's suggestions) don't solve the issue then there are JVM profiling tools available to see what your application is up to.

All that aside, I'd second ORM's are poor choices for data migration applications. Databases themselves provide much better tools.
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

"Databases themselves provide much better tools."

Exactly, and I meant to say, there is actually a term for migration products. They are ETL tools. Extractions, Transformation, and Load.

Mark
Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
This is rather like blackmail, too. First you freeze the discussion and then you complain...


Paul, you misunderstood me, I never meant it that way but only said that thinking that there is not going to be any more help having not found any follow up for a while. I don't know what you meant by "freeze the discussion".

Mark,
Yes, we have already moved to plain jdbc to migrate. I am not sure if we can use third party tools (SQL*Loader or SQL*Plus and scripts) to migrate as you suggested, because we have lots of logic involved in migration including changed table names, order, and the columns. Each of the new database tables has columns which depend on columns from various other tables. Anyways, I will surely look into those if we these tools can also do what we are doing currently.

Coming to hibernate, we currently modified the logic to speed up processing a little bit and are timing the speed at which we can get only a couple of hundred records. We are displaying 1000 records at a time and asking the user to press next and previous making another database call to fetch the required data using
setMaxResults(1000) and setFirstResult(startAtIndex)
methods. A complex query fetching 1000 records from 5 tables makes the user wait for 20 seconds for each click which is not good. I believe caching is something we never explored and I will talk to my fellows about caching.

Paul,
I suspect problems as some of us did request to add indices to few tables columns to speed up lookup on a particular column as is required in our logic than just the primary key.

database profiling tools will tell you if indices and other tuning are needed...
..there are JVM profiling tools available to see what your application is up to...

I will google these tools. Do let me know any free or open source tools that I can look into in my case?

Will soon update you on the performance timing.

-Cheers
Rama
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

Yes, we have already moved to plain jdbc to migrate. I am not sure if we can use third party tools (SQL*Loader or SQL*Plus and scripts) to migrate as you suggested, because we have lots of logic involved in migration including changed table names, order, and the columns. Each of the new database tables has columns which depend on columns from various other tables. Anyways, I will surely look into those if we these tools can also do what we are doing currently.



Um, that is specifically what ETL tools are for. I think you are making a huge mistake using JDBC or even Hibernate for migrating databases. And why does a user have to interact at all, if all you are doing is migrating a database. Use the right tool for the job, use ETL products.

Mark
Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
Guys, Hibernate's performance began improving. I have more questions as a follow up and am in need for some suggestions and comments.

Firstly, I modified all of the queries to get all the data I wanted in one single query just like any other JDBC query. It was harder to write the LEFT JOINS on many tables though. Could any body suggest a better way to modify the object and mapping file to LEFT JOIN many tables please...

Lets say TableA->TableB->TableC<-TableD and TableA->TableE->TableF where -> indicates the parent to child relationships and I have to get data from all of these tables in one shot. I created a new object with fields and constructors to populate only those fields which are required. I see that the LEFT JOIN query is a little complicated as, I had to include TableB objects as a set in Table A and TableC objects as a set in TableB with a
and change the TableAObj mapping file to include TableB set as



However, things change when relating TableC and TableD this way. So I had to write them differently as

TableCObj has TableDObj field
and the TableCObj mapping file had to be written as:


And finally get all the populated objects in a single query as:



Performance Using Caching:
I began researching on second level caching as we have a couple of huge documents to be created containing about 200,000 records. The thing is I setup EHCACHE and was able to store the actual queries into the cache and reduce the time by 1/10th of the actual time it took as follows.

Enable ehcache in cfg.xml and introduced ehcache.xml to setup the default TimeToIdle, TimeToLive, and other properties for each of my cached queries.

In each of the methods that call the query I modified them as:



So the questions are:
1) Everytime I call this method (can take upto half hour sometimes), does Hibernate optimize itself and know that it has to get the data from the cache and not run the query again or do I have to mention anywhere else to get data from cache?? Any other places that I have to look into?
2) What if someone adds a new record to the database, does ehcache update its cache and as well as the database also??

Migration Issues:
We could not use ETL tools as our DBA suggests that we have to migrate every other day as in an Online Transactional processing � OLTP. He suggests that we cannot use ETL for some reason and I did not know how to answer back!
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

OLTP Systems do not migrate every other day hundreds of thousands of records.

Now ETL in a Data Warehouse are used nightly by companies to migrate hundreds of thousands of records.

When you enable a second level cache with Hibernate, then Hibernate will first check the first level cache, then the second level cache, then the database.

How you set up your second level cache regions determines how often and when they are updated.

Hope that helps, and would love to hear the DBA's excuse, but not a big deal to me. I was an Oracle DBA for a few years, and well, I won't say anything negative about DBAs.

Mark
Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
Mark, looks like they might currently end up using jdbc queries running every night for the migratio .

Anyways, any suggestion how I can retrieve the child objects defined as sets in the mapping file using an HQL query as in:

select newObjConstructor(someParentObj.childObjaSet, someParentObj.childObjbSet..., some fields,... )
from parent someParentObj,..
where..


I want to do this because I have to get many sets of child objects in my code and iterate over every single entry of my parent table. I do not want to fall in the trap of asking for the child objects at run time which as discussed earlier wastes time.

-Regards
Rama
Keith Farmer
Greenhorn

Joined: Mar 16, 2008
Posts: 3
If you want everything to be returned as a single query you should use "join fetch" instead of simply "join". But if you need to handle big volumes of data - take a look at odal project at source forge. It is much better than using plain JDBC.
Rama Krishna
Ranch Hand

Joined: Oct 16, 2007
Posts: 110
Thank you for your suggestion.

The problem really occurred because when I left join a parent table with more than one child tables with a one-to-many relationships and look at the result, it has for every parent table entry which repeats many times and for the child tables, for each of the child tableA entry you will have all the child tableB entries and so on. The same resultset becomes humongous for bigger databases. Reading, processing and getting the required data by coding in JAVA from such a data is a huge pain. I am not sure if this is how you get the result set when you want to read by joining the parent table with many child tables.

What I actually was looking for was if there were was a way to get the list (or sets) of child objects directly for each parent entry somehow and then I can iterate each of the child sets individually or by any simple logic separately and the processing logic becomes very simple. So is there really a way in Hibernate or should there be a better way to write the SQL/HQL query itself.

-Cheers
Rama
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

"Coffe Rancher"

Please check you private messages for a message from me.
Keith Farmer
Greenhorn

Joined: Mar 16, 2008
Posts: 3
Originally posted by Paul Sturrock:
"Coffe Rancher"

Please check you private messages for a message from me.


Paul, I am sorry for not paying attention to the rules. The name is corrected.

Former Coffee Rancher ("some people call me Forrest Gump"... )
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

No worries. Thanks Keith
Keith Farmer
Greenhorn

Joined: Mar 16, 2008
Posts: 3
Originally posted by Rama Krishna:
Thank you for your suggestion.

The problem really occurred because when I left join a parent table with more than one child tables with a one-to-many relationships ...

-Cheers
Rama


I see your point. What about using HSQL or QBE to retrieve each set of children separately? Once again - no JDBC is involved.
Vic Lowtney
Greenhorn

Joined: Jun 18, 2003
Posts: 2
Hi all, this thread seems to be the most related to something I discovered while attempting to use hibernate 3.6 for an ETL task processing millions of rows. Since it bubbles up fairly high on Google's results for my search terms, I figured I'd post an update here in hopes that it might benefit others. First, like everyone has discovered, Hibernate is useless and dog slow for repetitive ETL tasks involving many database rows. For my task, we wer getting 1 record per second. We tried many of the suggestions here and elsewhere, e.g., disabling caching, playing around with lazy versus eager fetching, db indexing, etc., all to no avail. Hibernate remained obstinately slow. Here's what we discovered: It was very fast for the first 200-300 records it processed and then steadily declined from there. We tuned and tweaked every hibernate parameter we could find and nothing changed this behaviour. We then tried flush(), evictAll(), and clear() for every 200 rows or so. No workie. The next thing was to destroy and recreate the EntityManager every 200 rows. Despite the setup and teardown overhead of doing this, we went from 1 record per second to about 80 rows per second - much better. We then encapsulated each of batch into its own thread with its own EntityManager and multithreaded the entire process. We're now processing at about 250 rows per second and are only limited by Oracle performance now.

Conclusion: I'm not an expert on the architecture of EntityManager, but it clearly wasn't designed for ETL. It seems to be fine for runtime appserver use, but even in that scenario I will be eyeing EntityManager suspiciously from now on.
N Khoury
Greenhorn

Joined: Jul 15, 2011
Posts: 1
Vic Lowtney wrote:Hi all, this thread seems to be the most related to something I discovered while attempting to use hibernate 3.6 for an ETL task processing millions of rows. Since it bubbles up fairly high on Google's results for my search terms, I figured I'd post an update here in hopes that it might benefit others. First, like everyone has discovered, Hibernate is useless and dog slow for repetitive ETL tasks involving many database rows. For my task, we wer getting 1 record per second. We tried many of the suggestions here and elsewhere, e.g., disabling caching, playing around with lazy versus eager fetching, db indexing, etc., all to no avail. Hibernate remained obstinately slow. Here's what we discovered: It was very fast for the first 200-300 records it processed and then steadily declined from there. We tuned and tweaked every hibernate parameter we could find and nothing changed this behaviour. We then tried flush(), evictAll(), and clear() for every 200 rows or so. No workie. The next thing was to destroy and recreate the EntityManager every 200 rows. Despite the setup and teardown overhead of doing this, we went from 1 record per second to about 80 rows per second - much better. We then encapsulated each of batch into its own thread with its own EntityManager and multithreaded the entire process. We're now processing at about 250 rows per second and are only limited by Oracle performance now.

Conclusion: I'm not an expert on the architecture of EntityManager, but it clearly wasn't designed for ETL. It seems to be fine for runtime appserver use, but even in that scenario I will be eyeing EntityManager suspiciously from now on.


Thank you for sharing your approach. I set up my framework and will be starting a HUGE data migration in the next week or so. My previous experience with hibernate showed it was a bit slow with big queries and I was a bit worried about that.... I'll use this technique and I'll try to come back later with any statistical data about the performance.

Thanks,
N.
Graham Willis
Greenhorn

Joined: Mar 05, 2012
Posts: 1
"We're now processing at about 250 rows per second and are only limited by Oracle performance now"

Given that an Oracle direct load can typically load 500,000 rows/sec I would say you are not. Unless you have a ton of triggers on the tables, there you are no way hitting the limit of Oracle performance there. Even if I fire indivdual insert statements at an Oracle database running on my laptop database I can get way better than that.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Hibernate performance issues using huge databases
 
Similar Threads
So much on the Left Joins in Hibernate
List that contains null values
polymormsim related demos
Hibernate getting started
XML Databases instead of good old RDBMS?