wood burning stoves*
The moose likes Object Relational Mapping and the fly likes Hibernate and millions of rows.. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Object Relational Mapping
Bookmark "Hibernate and millions of rows.." Watch "Hibernate and millions of rows.." New topic
Author

Hibernate and millions of rows..

nimo frey
Ranch Hand

Joined: Jun 28, 2008
Posts: 580
I have red this thread http://www.coderanch.com/t/216713/ORM/java/Do-we-over-hibernate and this http://www.coderanch.com/t/214767/ORM/java/ORM-suitable-big-apps, but I have still questions:

For example,
imagine a database-table with 3.000.000 millions of rows.

I know, Hibernate offers some optimizations
(such as "hibernate.jdbc.batch_size" or "disabling second level cache" or "Stateless Sessions" to handle operations to a big deal of data.

And it is most of all a lack of unknown about hibernates possibilities of optimizations when dealing with millions of rows.
(This thread should fill the gap:-)

What are your practical experiences by working with so much datas?

What are best practices when working with millions of rows?

Should we avoid pure hibernate and call JDBC-API calls (via Hibernate?) directly
(Stored Procedures) when working with so much datas?
How can we make it faster?

Where lies the definitive disadvantageous in ORM
when working with millions of rows?
Are there any?

What should we do or know handling such scenarios?
[ December 15, 2008: Message edited by: nimo frey ]
nimo frey
Ranch Hand

Joined: Jun 28, 2008
Posts: 580
Okay,
I have tested and made for now these configs to deal with so much data:

In my properties-file:



The "second level cache" should be disabled programmatically
as you do not need to disable the cache in all scenarios:



Now, let s look at all CRUD-Operations:

For SAVE-Operation:

- after flush, you should call clear,
to clear the cache (first level cache? clear all references?)

public void save()
for (int i=1; i<= 2000000; i++)
{

Item i = new Item();
entityManager.persist(i);
entityManager.flush();

// but when I clear within the loop, does hibernate.jdbc.batch_size works?
// I flush and clear after I saved one record..so there can nothing be //batched, am I right?

entityManager.clear();
}

For READ-Operations:

- the only thing, I know, is to set the boundaries of your selection.
If you "set first result" to 0 and "max result" to 2000,
then the list result returns the first 2000 records (when I call this method again from the same session (?), then it returns the result-list contains the records from 2001-4000 (and so on). Am I right??



- the other thing, I know, is that when you call getReference, then you have NO database-hit:

public void read() {

// but reference works only, if this item is in my cache. Am I right??
Item i = i.getReference(Item.class, 1);

}


For DELETE-Operations:

I guess, delete is fast enough(am I right? Are there any optimizations there, too?)

For UPDATE-Operations:

What for optimizations exists for update-statements?


This thread should summarize all optimizations-strategies which can (should) be made by handling millions of datas.

Any suggestions. practical experiences or best-practices are very welcome:-)

We can categorize it at follows:
Category 1: Optimizations in: properties.xml
Category 2: Optimizations in: CRUD-Operations
Category 3: Optimizations in: ?)
[ December 15, 2008: Message edited by: nimo frey ]
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336


What are your practical experiences by working with so much datas?

What are best practices when working with millions of rows?

Not using an ORM for such large bulk operations is what I'd consider best practice. Databases come with bulk data manipulation tools that are far better suited.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
nimo frey
Ranch Hand

Joined: Jun 28, 2008
Posts: 580
Not using an ORM for such large bulk operations is what I'd consider best practice.


I can use conventional JDBC within Hibernate or do a Stateless Session.

Databases come with bulk data manipulation tools that are far better suited.


So JDBC and StoredProcedures,PreparedStatemends are not common for such cases?

Hmm..I have never heard about "bulk data manipulation tools".

Do you know such tools for DB2 or MySQL?

Cannot find anything. Does these tools (API?) are integrated within JAVA? I have thougth that JDBC or the Hibernate is well suited for such amount of data.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336


I can use conventional JDBC within Hibernate or do a Stateless Session.

Yes. But I still wouldn't recommend processing millions of rows via an ORM.

I don't know about DB2 but I'd be surprised if it didn't. Bulk loading/unloading and data transformation tasks are as old as the hills.
nimo frey
Ranch Hand

Joined: Jun 28, 2008
Posts: 580
okay
I look for such tasks and give you feedback if it can integrated in JAVA.

bye
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Hibernate and millions of rows..
 
Similar Threads
JDBC OR HIBERNATE ?
JDO, Hibernate, Castor
Confused about ORM tool selection.
Criteria to use Hibernate in your design
Reader level of the book