Vic Lowtney wrote:Hi all, this thread seems to be the most related to something I discovered while attempting to use hibernate 3.6 for an ETL task processing millions of rows. Since it bubbles up fairly high on Google's results for my search terms, I figured I'd post an update here in hopes that it might benefit others. First, like everyone has discovered, Hibernate is useless and dog slow for repetitive ETL tasks involving many database rows. For my task, we wer getting 1 record per second. We tried many of the suggestions here and elsewhere, e.g., disabling caching, playing around with lazy versus eager fetching, db indexing, etc., all to no avail. Hibernate remained obstinately slow. Here's what we discovered: It was very fast for the first 200-300 records it processed and then steadily declined from there. We tuned and tweaked every hibernate parameter we could find and nothing changed this behaviour. We then tried flush(), evictAll(), and clear() for every 200 rows or so. No workie. The next thing was to destroy and recreate the EntityManager every 200 rows. Despite the setup and teardown overhead of doing this, we went from 1 record per second to about 80 rows per second - much better. We then encapsulated each of batch into its own thread with its own EntityManager and multithreaded the entire process. We're now processing at about 250 rows per second and are only limited by Oracle performance now.
Conclusion: I'm not an expert on the architecture of EntityManager, but it clearly wasn't designed for ETL. It seems to be fine for runtime appserver use, but even in that scenario I will be eyeing EntityManager suspiciously from now on.