This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Object Relational Mapping and the fly likes Is ORM suitable for *big* apps ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Databases » Object Relational Mapping
Bookmark "Is ORM suitable for *big* apps ?" Watch "Is ORM suitable for *big* apps ?" New topic
Author

Is ORM suitable for *big* apps ?

Chris Dillon
Greenhorn

Joined: Feb 13, 2004
Posts: 6
Hi,
I am working on a big application.
By big, I especially mean over 250 database tables (and quite a couple of them can easily grow over 100 000 records).
We are not heading for web apps (at least for the next 2 years) so we would have under 300 concurrent users...
Currently, the persistence scheme is developped using mainly stored procedures and a couple of "built on the fly" SQL statements.
Our app is going to be deploied on several different brand of database (Sybase, Oracle and DB2), yet only one brand per client (thank god ;D).
And obviously it wouldn't be fun if the database schema was to be the same with all clients, i.e. most tables are the same but there will be a couple a significant differences...
We're on the edge of going from Sybase to Oracle (with a slighlty different schema), and as most actual SQL is a stored procedure it is going to be hell...
We are not using EJB, but may be going onto an app server, still that is unclear at the moment...
My question is: are JDO or Hibernate (or others...) a viable solution (development and performance-wise) for us?
I often read Hibernate is great but all the cases I've seen were handling less than 50 tables...
Any opinions ?
Thanks a lot in advance
Cheers
Chris
Erik Bengtson
Ranch Hand

Joined: Dec 06, 2003
Posts: 90
I would go with pure JDBC and DAO calling stored procedures. The advantages are (you probably know):
- one DAO implementation for each database, but a very simple DAO just calling the stored procedures
- no configuration files
- no bugs on the O/R mapper
- to add a new database, you just have to know the sql for the database. Using a O/R Mapper, each database will show its limitation with JDBC, and you may need to change your java code.
I'm a JDO implementation developer, and I don't recommend you to use any O/R mapper. It's possible, but I wouldn't.
Matthew Phillips
Ranch Hand

Joined: Mar 09, 2001
Posts: 2676
I haven't done any research into performance issues for ORMs, but from a code base perspective this sounds like a project where an ORM could be ideal. An ORM will allow you to make table and db platform changes in a properties file instead of changing/compiling your source code.


Matthew Phillips
Scott Ambler
author
Ranch Hand

Joined: Dec 12, 2003
Posts: 608
The number of tables in the database isn't an issue. The true issue is the performance of the individual transactions against the DB,and that will be determined by:
1. The number of joins, if any, required. This will be determined by the level of normalization within your DB, the size of the tables being joined, and the actual mapping between your object and data schema.
2. The amount of data to be transmitted across the network. You may find that you want to invoke stored procs sometime to do the processing on the DB server, depends on the nature of your app.
3. Other normal db performance issues.
The only way you're going to be able to fairly determine if an ORM tool will work for you is to work with one for a bit and see for yourself.
You might find www.agiledata.org/essays/mappingObjects.html to be of interest.
- Scott


<a href="http://www-306.ibm.com/software/rational/bios/ambler.html" target="_blank" rel="nofollow">Scott W. Ambler</a><br />Practice Leader Agile Development, IBM Rational<br /> <br />Now available: <a href="http://www.ambysoft.com/books/refactoringDatabases.html" target="_blank" rel="nofollow">Refactoring Databases: Evolutionary Database Design</a>
eammon bannon
Ranch Hand

Joined: Mar 16, 2004
Posts: 140

I often read Hibernate is great but all the cases I've seen were handling less than 50 tables...

Just to put your mind at rest, we are using it with nearly 400 tables (its one ugly legacy DataModel - and it is deployed multi-platform like yours) and performance is OK. But Scott is right - this isn't really the issue.
Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761

eeamon,
Do you use composite keys (Hibernate) to cater for your legacy DataModel ?
Regards,
Pho


Regards,

Pho
eammon bannon
Ranch Hand

Joined: Mar 16, 2004
Posts: 140
Yes.
Pj Murray
Ranch Hand

Joined: Sep 24, 2004
Posts: 194
Hello Chris,

You might find it useful to read this:

"Choosing a Java Persistence Strategy"


http://www.codefutures.com/weblog/andygrove/archives/2005/01/index.html


To answer one question: 50 tables is not an issue.

In fact, the return on investment in a good tool increases when the number of tables increases.

Regards
PJ Murray


PJ Murray -
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 15961
    
  19

I've spent the last year working on a project that's probably got about 30 tables (counting constant tables) in it. I just got done with a profiling run that processed about 500,000 records.

The original version was jdbc-based. I switched it to JDO in hopes of gaining benefits from caching. Unfortunately, I found out the hard way (over a 30-day period or extreme debugging), that JDBC or JDO didn't matter - either way the Oracle cache had no discard mechanism and eventually the app would run out of memory and crash. So I had to disable it.

The particular JDO product I'm using allows me to create custom cache classes on a per-table basis, but I'm not yet that desperate. The stats that their monitoring program have actually made me consider simply preloading the 3 most commonly-searched tables into memory and accessing them as simple collections (small tables, but MILLIONS of hits).

However, in my experience, the difference in untuned JDO vs. naive JDBC was minimal. I do think that JDO kept the code cleaner, though.


Customer surveys are for companies who didn't pay proper attention to begin with.
Dave Clark
Ranch Hand

Joined: Feb 16, 2005
Posts: 52
I'd say that the number of tables you have and the number of databases that you need to cater for are 2 requirements that make an object-relational mapping solution like JDO a *far* better solution that JDBC (EJB 2.x is far too heavyweight for that number of tables though).

If you choose to go with JDBC and a DAO approach, you'll need a separate DAO for each table for each database = 250 table * 3 databases = 750 DAOs!

If you use simple DAOs calling stored procedures, you still have to write the stored procedures by hand (and maintain them by hand), but just as bad, you're application logic is now spread across both the appserver and database tiers, rather than living entirely inside the appserver tier where it belongs (and is more maintainable).

Rather than the JDBC approach, a JDO approach lets you simply manage a set of mappings for each of the tables, and if you've got some good visual mapping tools (<plug>like those in Versant Open Access</plug> you can simply manage 3 JDO workbench projects - 1 for each database. Making changes to your mapping files is certainly a lot easier and more robust (less prone to breakage) than maintaining hand-crafted SQL.

A JDO approach scales very well iboth in regards to number of tables, as well as number of records in those tables.

cheers,

Dave.


Dave Clark<br />Senior WebSphere Architect<br /><a href="http://www.versant.com" target="_blank" rel="nofollow">Versant Open Access - JDO2 & EJB3</a>
Tim Wilson
Greenhorn

Joined: Feb 25, 2005
Posts: 2
Originally posted by Dave Clark:

If you choose to go with JDBC and a DAO approach, you'll need a separate DAO for each table for each database = 250 table * 3 databases = 750 DAOs!

<snip>

Rather than the JDBC approach, a JDO approach lets you simply manage a set of mappings for each of the tables, and if you've got some good visual mapping tools (<plug>like those in Versant Open Access</plug> you can simply manage 3 JDO workbench projects - 1 for each database. [/QB]



I think I'd rather maintain 750 Java classes than 750 proprietary mapping files, visual tool or not!

JDO is in serious danger... the JCP has already voted against it and Sun as published a open letter saying that they want to put the spec into maintenance.

ORM is interesting technology but I'd recommend sticking to a simple DAO approach personally.

Cheers,

Tim.
Pj Murray
Ranch Hand

Joined: Sep 24, 2004
Posts: 194
In reply to some comments above:

JDO is only in danger due to the internal politics of the JCP. The JDO specification never had the support of major vendors anyway - only independent software vendors (CodeFutures included).

The problem with the statements from Sun and the vote against JDO 2.0 is that it damages the confidence in the future of the specification.

The only way around that is for vendors to commit to support JDO 2.0 - and CodeFutures has promised its customers that it will support JDO 2.0, regardless of whether the specification is approved.

Even with CodeFutures and other vendors promises, if you're starting out from scratch, you may be wise to wait at least until the next JCP vote.

_________________

With regard to hand writing JDBC DAOs - I would never suggest doing that. You need to generate them, then regenerate if you need to make changes. No hand coding or manual changes.

CodeFutures provides a tool to reverse engineer database schemas and generate JDBC DAOs in a few seconds. You can even choose to generate the code in either ORM or DAO style code.

If you need to do it for 3 datebases, that means it will take you an extra 2 minutes ...

If you change a table or view in the database, you just regenerate the DAO for that table/view.


Final note:

CodeFutures is neutral on the JDBC DAOs versus JDO versus EJB CMP versus Hibernate and generates persistence code for all of them.

JDBC DAOs - extremely efficient and straightforward approach, very popular
JDO - the best data persistence specification available, but politics is hurting it badly
EJB CMP - complex, which makes code generation very useful
Hibernate - gaining massive popularity, clearly where the market momentum is


Regards
PJ Murray

[ February 25, 2005: Message edited by: PJ Murray ]

[ February 25, 2005: Message edited by: PJ Murray ]
[ February 25, 2005: Message edited by: PJ Murray ]
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336


With regard to writing JDBC DAOs - I would never suggest doing that.

Ditto. The only applications I've been involved in where this was the route taken predate any useful ORM solution (including Entity Beans) - and then we took the time to write an application which could generate all this JDBC code from a Rose generated model. Otherwise some poor developer would probably find their entire job was tweaking DAO code.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Andy Grove
Greenhorn

Joined: Nov 11, 2003
Posts: 18
I'm curious .. why was a code generator built in-house?

If a commercial generator (<plug>such as FireStorm/DAOlt;/plug> ) was used there wouldn't be the need to tweak the generated code (although we do allow customers to tweak the code generator itself if they want to change the way that code is generated).

Regards,

Andy Grove
CodeFutures
[ February 25, 2005: Message edited by: Andy Grove ]
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

Originally posted by Tim Wilson:



I think I'd rather maintain 750 Java classes than 750 proprietary mapping files, visual tool or not!

Cheers,

Tim.


Actually either case is easy with auto-generation using XDoclet.

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Andy Grove:
I'm curious .. why was a code generator built in-house?
I've done this on more than one occassion. The reason: there were no other options at the time.
If a commercial generator . . . was used there wouldn't be the need to tweak the generated code.
I believe Paul said that the tweaking would be required only had a code generator not been used.
Dave Clark
Ranch Hand

Joined: Feb 16, 2005
Posts: 52

Originally posted by Tim Wilson:

I think I'd rather maintain 750 Java classes than 750 proprietary mapping files, visual tool or not!


With JDO 2, the XML files which specify the mappings between objects and relational database tables become standardized - so with around 2 dozen open source and commercial JDO implementations moving quickly to implement JDO 2, this is no longer an issue.

And you *won't* end up with 750 mapping files - JDO gives you a great deal of flexibility here - usually the best option is to have a single mapping file per java package, so you'd probably end up with maybe 20 mapping files (though you could have just 1 if you wanted).

More to the point, JDO defaults most of the mapping information from the class itself if you're generating the DB schema, so most mapping files have just a single XML entry, like the following:

<?xml version="1.0" encoding="UTF-8"?>
<jdo>
<package name="com.mycompany.myapp.mysubsystem.model">
<class name="ADomainObject" />
<class name="AnotherDomainObject" />
<class name="YetAnotherDomainObject" persistence-capable-superclass="ADomainObject" />
...
</package>
</jdo>

And if you're using a visual mapping tool like the one provided with Versant's Open Access, you don't even need to deal with the XML. The VOA workbench also lets you manage multiple projects, send DDL directly to your database, do visual mapping of your RDBMS schema with E-R diagrams, monitor and tune performance... etc

Rolling your own just doesn't beat using a quality enterprise-class off-the-shelf product if it meets your needs - especially once you get into applications the size of 250 tables * 3 database flavours.

I'd suggest trying Versant and a couple of other commercial and open source JDO implementations before deciding to hand-code everything.

cheers,

Dave.
Dave Clark
Ranch Hand

Joined: Feb 16, 2005
Posts: 52
oh - almost forgot.

The JCP Executive Committee are actually voting on whether or not JDO 2 will become a standard or not right now. The ballot results are due on Monday 2/28.

So hopefully on Monday JDO 2.0 will be an officially JCP-blessed Java standard!

cheers,

Dave
Pj Murray
Ranch Hand

Joined: Sep 24, 2004
Posts: 194
Originally posted by Tim Wilson:



I think I'd rather maintain 750 Java classes than 750 proprietary mapping files, visual tool or not!

JDO is in serious danger... the JCP has already voted against it and Sun as published a open letter saying that they want to put the spec into maintenance.

ORM is interesting technology but I'd recommend sticking to a simple DAO approach personally.

Cheers,

Tim.


OK -you have a point about the mapping files.

But if you don't like mapping files - then just generate the code - JDBC DAOs, JDO, EJB CMP, Hibernate, ....

We'll know the futures of JDO pretty soon - the JCP has voted by now - any many vendors - including CodeFutures - have promised to continue supporting the specificaiton.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

Originally posted by Andy Grove:
I'm curious .. why was a code generator built in-house?

If a commercial generator (<plug>such as FireStorm/DAOlt;/plug> ) was used there wouldn't be the need to tweak the generated code (although we do allow customers to tweak the code generator itself if they want to change the way that code is generated).

Regards,

Andy Grove
CodeFutures

[ February 25, 2005: Message edited by: Andy Grove ]


Basically, because at the time we were doing this there were very few alternatives (this was about 5 years ago).
Ken Krebs
Ranch Hand

Joined: Nov 27, 2002
Posts: 451
If you have 750 tables and 750 mapping files then ORM is the wrong tool for the job or you're not using the ORM framework properly. The point of an ORM framework like Hibernate is to support a rich domain model, that is, one with complex associations bewtween the domain objects.

Suppose you have a complex object model that supports the needs of your application well and you have a database model that is structured in such a way that it supports efficient data storage/retrieval. Quite often the 2 structures will not match up well. It is that problem which ORM seeks to solve.

If there is no mismatch between the needs of your application and data storage, don't use an ORM because it isn't the right tool for the job. In that case, you're better off using some sort of JDBC framework like that available in Spring or some sort of middle ground framework like IBatis. Programming JDBC directly is not advisable because it is too low level to be productive with.


kktec<br />SCJP, SCWCD, SCJD<br />"What we observe is not nature itself, but nature exposed to our method of questioning." - Werner Heisenberg
Pj Murray
Ranch Hand

Joined: Sep 24, 2004
Posts: 194
Originally posted by Ken Krebs:
Programming JDBC directly is not advisable because it is too low level to be productive with.


I agree with Ken.

You are much better off generating the JDBC code for many good reasons.

PJ Murray
CodeFutures Software
Thomas Whitmore
Ranch Hand

Joined: Aug 05, 2004
Posts: 33
Hi Chris, PJ, TIm, people,

Interesting discussion. I'm not going to pretend to solve any performance problems, but this application should be eminently do-able.

PJ's solution is nice because you can see it. And touch it, in the form of monolithic blocks of code. And modify it manually. All 750 classes worth of it...

But I suspect that when you start doing reporting, or want to tune which fields the app fetches, that those monolithic blocks don't have flexibility in this area. Or if they do, it's starting to get as complicated as a real mapping layer anyway.

What does it end up with, DAO count = databases * fetch plan count * table count = 3 * 250 * 4... ? The tangible nature of the solution probably turns out to be its own limitation; this is why we wanted to get away from expressing this stuff as code in the first place.

[Admittedly PJ expresses his stuff with an intermediate tool. If the intermediate tool is so good, why do we even need to see the Java though ?]

Mapping a field value or a primary key is trivial; it's the structure and control to achieve good efficiency that are the challenge.

Anyway, a few thoughts on performance.

Chris's app has ~100,000 rows in some tables. Answer: if they're indexed, table size is essentially free well into the millions of rows. Most index pages will live in cache since a whole index should only take a few Mb.

App has over 250 database tables. Question: are a lot of these Unioned data, or split off by date or period? Number of tables is also free as far as performance goes, but mapping may be a little outside the norm.

We would have under 300 concurrent users. Answer: many standard databases & hardware should be able to execute 30 - 120 structured reads per second, and perhaps 15 - 30 writes. By structured, I mean an Order, 5 OrderLines and their Product references.

I'm not sure how fast your 300 users type exactly :-) but I suspect they take more than 10 seconds per transaction. Allow the application a duty ceiling of 30 - 40%, so it still seem responsive.

App is going to be deployed on several different brand of database. Answer: here's your customer requirement right here, mapping is probably going to be the ongoing issue rather than performance.

Most actual SQL is a stored procedure it is going to be hell. Answer: yes. And if the customer's DBA requires SPROCs you're going to have to work with these anyway, sorry.

Scaleability and caching. Answer: yes. Most O/R tools have a shared instance cache which provides basic performance boosts. Much further gains are typically available, if required, from application-aware caching.

Basically Caching is where you should deploy your Java code, not a Data Access Service but an Application Cache Service.

This is written to udnerstand the application's data locality and be able to cache groups/ sets which the application commonly requests. A naive instance cache can't help much here, but your ProductCache can remember Products by Category, TransactionCache allows Transactions to be held for hot (TimePeriod, CostCentre) sectors, or whatever your app works by.

Even large and high-volume tables, far too large to cache in their entirety, have 'hot spots' which are 20% of the size but 80% of the activity. A smart cache can be much more effective with these. Try and code the cache to understand locality and grouping, rather than dictating the actual access pattern. This way you can adapt well to reporting or different access patterns.

How large do you want to scale anyway ?


Cheers,
Thomas
www.powermapjdo.com
Pj Murray
Ranch Hand

Joined: Sep 24, 2004
Posts: 194
Originally posted by Thomas Whitmore:
[QB]Hi Chris, PJ, TIm, people,



PJ's solution is nice because you can see it. And touch it, in the form of monolithic blocks of code. And modify it manually. All 750 classes worth of it...




I have have never suggested that you manually modify / rewrite any generated DAO (or ORM) Java code.

If you need to make a change, for example, a table or view has been modified, then you re-generate the Java code.

If the generated code does not look the way you want it, then you modify how the code is generated by changing the code generation template.

The whole point is that you don't manually write, re-write, or manually change data persistence code at any point.

CodeFutures supports both DAO and ORM style code generation for JBDC, JDO, EJB CMP, and Hibernate. We get asked so many times about the best choice that I've written a short note on the data persistence options.



Regards

PJ Murray
CodeFutures Software
[ March 03, 2005: Message edited by: PJ Murray ]
Robert Monaco
Greenhorn

Joined: Jul 08, 2005
Posts: 1
I stumbled accross this discussion and was curious to know what the outcome of this thread was? Did you choose a persistence solution?

Robert Monaco
SolarMetric
Chris Dillon
Greenhorn

Joined: Feb 13, 2004
Posts: 6
First of all I would like to thank everyone who posted around here and also to apologize for not answering earlier (a couple of problems with the email I use here so I haven't used it so much for over a year well whatever...)

Due to a couple of problems around here, we still haven't completely made the move...
I hacked a quick & dirty generator to move the strored procedures from Sybase to Oracle and strated to move the calls to DAO (one per table) (but they are only a couple of them for the moment...)

We are going to investigate again in a couple of months if things go right on schedule...

We are probably going to move on hibernate as this seems to be the "best" solution for the moment, yet we still need to investigate...

Sorry I can't really answer your question...

Thanks for this interresting thread!

Cheers
Chris
Mike Keith
author
Ranch Hand

Joined: Jul 14, 2005
Posts: 304
Hi Chris,

Was not here for the original conversation, but I will add a few comments just to fill you in on a few other options.

First, the answer is yes, ORM does scale in numerous degrees of development. There are always limitations of reason that I have seen people hit, but they are encountered primarily by the unreasonable and the ridiculous. Of course one man's wisdom is another's folly, but let's define reason as not doing a join across all 750 tables, and not using a single inheritance hierarchy for 100 of them. If you are looking for industrial strength and performance beyond what you are seeing, take a good look at Oracle TopLink. It is free to download and try, and is the most mature and proven product on the market.

http://www.oracle.com/technology/products/ias/toplink/index.html

Oracle and Sun have recently announced that the TopLink group will be doing the reference implementation for EJB 3, so a version of TopLink that just supports EJB 3 will soon be available as open source. The beauty of this arrangement is that people can use the RI version of TopLink for development and for production, until they get to the point when they are deploying to serious enterprise applications, such as what you are proposing, and then they can move transparently up to the full version of TopLink that supports clustering and cache coordination, etc., if they choose to do so.

Regards,
-Mike


-Mike
Pro JPA 2: Mastering the Java Persistence API
Appu Chan
Greenhorn

Joined: Aug 29, 2002
Posts: 28
We are using Hibernate for a large application which has about 700 tables. We did evaluate Toplink also and found Hibernate to be a more cost-effective solution. We use Spring Framework to do the plumbing of application components. The performance has been pretty good so far and hibernate is flexible enough to allow various configurations. We use Oracle 10g as the DB.
 
 
subject: Is ORM suitable for *big* apps ?
 
Similar Threads
Changing DB
Author question: JDBC in Java Code.
Design help for the Data Loaders
Stored Procedure Making Two different Server Database Call
all this O/R stuff...my questions