N Khoury

Greenhorn
+ Follow
since Jul 15, 2011
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
1
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by N Khoury

Vic Lowtney wrote:Hi all, this thread seems to be the most related to something I discovered while attempting to use hibernate 3.6 for an ETL task processing millions of rows. Since it bubbles up fairly high on Google's results for my search terms, I figured I'd post an update here in hopes that it might benefit others. First, like everyone has discovered, Hibernate is useless and dog slow for repetitive ETL tasks involving many database rows. For my task, we wer getting 1 record per second. We tried many of the suggestions here and elsewhere, e.g., disabling caching, playing around with lazy versus eager fetching, db indexing, etc., all to no avail. Hibernate remained obstinately slow. Here's what we discovered: It was very fast for the first 200-300 records it processed and then steadily declined from there. We tuned and tweaked every hibernate parameter we could find and nothing changed this behaviour. We then tried flush(), evictAll(), and clear() for every 200 rows or so. No workie. The next thing was to destroy and recreate the EntityManager every 200 rows. Despite the setup and teardown overhead of doing this, we went from 1 record per second to about 80 rows per second - much better. We then encapsulated each of batch into its own thread with its own EntityManager and multithreaded the entire process. We're now processing at about 250 rows per second and are only limited by Oracle performance now.

Conclusion: I'm not an expert on the architecture of EntityManager, but it clearly wasn't designed for ETL. It seems to be fine for runtime appserver use, but even in that scenario I will be eyeing EntityManager suspiciously from now on.



Thank you for sharing your approach. I set up my framework and will be starting a HUGE data migration in the next week or so. My previous experience with hibernate showed it was a bit slow with big queries and I was a bit worried about that.... I'll use this technique and I'll try to come back later with any statistical data about the performance.

Thanks,
N.