aspose file tools*
The moose likes JDBC and the fly likes JDO real world implementation question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "JDO real world implementation question" Watch "JDO real world implementation question" New topic
Author

JDO real world implementation question

John Takacs, D.P.M.
Greenhorn

Joined: Nov 15, 2000
Posts: 10
I'm looking to implement a database persistance layer for a search engine I run.
Currently I have 3 million + urls in a mysql database. The main table has several important columns but the the ones that matter are:
Title, Description, and url.
If I create an object that is essentially a single row of data, i.e. Title, Description, and url, I can get the size of that object down to about 450 bytes.
My question then is, is it safe to say that my memory use in the above example, for JDO only, would be 450 bytes * 3,000,000 rows?
Am I missing something here?
Thanks in advance!


John Takacs, DPM
David Jordan
Author
Ranch Hand

Joined: Jun 14, 2003
Posts: 66
For your data, the object that has the 450 bytes of data, JDO only adds a reference and a byte of its own to manage each instance. If you do a query to get that single instance, you get that single instance. But to support the caching of multiple objects retrieved from the database, the JDO implementation will maintain some additional data structures in memory. The memory overhead of this will be implementation specific. There is typically something like a HashMap that maintains a mapping between an object identifier, which in your case may be the primary key of your table, and a reference to the object. This is used so that each time in a transaction you try and access the same object, it can return a reference to the single instance of the object in your application cache. The space overhead of the JDO implementation will vary, I don't know what kind of sizes to expect, but it is not going to be substantial.
Craig Russell
Author
Greenhorn

Joined: Jun 16, 2003
Posts: 28
I think it would be a very interesting benchmark to store your 3 million instances in memory as a regular HashMap and in a JDO cache with a persistent instance having a persistent HashMap with the 3 million persistent instances. There certainly is additional overhead for the persistence management, which will be implementation-specific. But you might be surprised by the results.
Originally posted by John Takacs, D.P.M.:
I'm looking to implement a database persistance layer for a search engine I run.
Currently I have 3 million + urls in a mysql database. The main table has several important columns but the the ones that matter are:
Title, Description, and url.
If I create an object that is essentially a single row of data, i.e. Title, Description, and url, I can get the size of that object down to about 450 bytes.
My question then is, is it safe to say that my memory use in the above example, for JDO only, would be 450 bytes * 3,000,000 rows?
Am I missing something here?
Thanks in advance!
John Takacs, D.P.M.
Greenhorn

Joined: Nov 15, 2000
Posts: 10
Thanks for the answer that really cleared things up.
One, thing I need to brush up on, and maybe this would be a great topic for discussion for another post, but something along the lines, of just exactly how does JDO work.
For example, you mention that JDO will create a reference to an object, in this case my 450 byte object which is basically the TITLE, DESCRIPTION, and URL column for a single row of data, i.e. "JavaRanch Big Moose Saloon", "Great Java Resource for learning about JDO, JDBC and other stuff.", "http://www.javaranch.com"
How does having a reference to an object help, when my user performs a query, such as "Learn JDO"? I assume the reference is simply an ID of some sort, not really any text related info.
I think that what I'm getting hung up on, is that I think of JDO as placing all data objects in memory, every object being one of my above mentioned rows of data, but in fact it isn't that at all.
David Jordan
Author
Ranch Hand

Joined: Jun 14, 2003
Posts: 66
No, JDO does not store the entire set of rows from a table in memory. Craig's response may have given that incorrect impression.
You would issue a query on fields of your Java object. This would get mapped to a SQL query on the corresponding column in the corresponding table. If only one object matched the query constraint, you would just get one object back, not the entire table.
When I use the word reference, I mean a standard Java reference. That is different from a primary key, from an identity value, etc.
Craig Russell
Author
Greenhorn

Joined: Jun 16, 2003
Posts: 28
JDO can be used for entirely in-memory applications (for example, if you chose to have all of your data within a few machine instructions) or entirely on disk, or any combination of cached and disk-resident.
I incorrectly assumed that you wanted to have all of the data in memory; hence your calculations of memory usage. But it certainly would be practical to have the data on disk and only fetched when needed.
Originally posted by David Jordan:
No, JDO does not store the entire set of rows from a table in memory. Craig's response may have given that incorrect impression.
You would issue a query on fields of your Java object. This would get mapped to a SQL query on the corresponding column in the corresponding table. If only one object matched the query constraint, you would just get one object back, not the entire table.
When I use the word reference, I mean a standard Java reference. That is different from a primary key, from an identity value, etc.
Mike Farnham
Ranch Hand

Joined: Sep 25, 2001
Posts: 76
John,
I am not familiar with MySQL's datastructures. I know Oracle has a number of blobs for storing large strings or binary objects.
Are you specifically limiting the length of a URL to say 255 or maybe 512 characters?
I am only asking mostly out of curiousity. I know the GET has a definite limit. A co-worker and I were just discussing what the limit might be for POST. A quick search using Google revealed a variety of answers. She created a sample URL that was 659 characters long. And, not one of the browsers failed to correctly deal with it.
Cheers,
Mike
John Takacs, D.P.M.
Greenhorn

Joined: Nov 15, 2000
Posts: 10
Craig,
Yes, I agree that would be very interesting. I may have time over the weekend to try it out, while I wait for your book, plus a Java Objects book.
RE: keeping all data objects in memory, your assumption is correct. I really would like to keep all those objects in memory, hence my calculation. We were/are on the same sheet of music.
Thanks for all of your help and suggestions!
Originally posted by Craig Russell:
I think it would be a very interesting benchmark to store your 3 million instances in memory as a regular HashMap and in a JDO cache with a persistent instance having a persistent HashMap with the 3 million persistent instances. There certainly is additional overhead for the persistence management, which will be implementation-specific. But you might be surprised by the results.

[ June 18, 2003: Message edited by: John Takacs, D.P.M. ]
David Jordan
Author
Ranch Hand

Joined: Jun 14, 2003
Posts: 66
OK, I misunderstood, because when you asked about a row having 450 bytes of info and whether that is all that would be brought in memory, I thought you were wanting to limit the amount of data brought in from the database to the specific data you were operating on.
When I worked at Bell Labs, we were always building these very large in-memory databases so that we had very efficient lookups so that telephone calls could be processed quickly. We could never use a relational database for this because we really need to have the data memory resident to meet our performance requirements.
This is one of several reasons that led me to object databases where we could essentially allocate a ton of memory to the in-memory cache, letting the object database take care of managing the synchronization with the database, allowing us to also have direct access the our data as objects, etc. It worked out well. We got our in-memory object cache, plus we got the reliability of having that synched to disk. We built some of these in-memory databases ourselves, but much of our development was not necessary if we directly used an object database and gave it a ton of memory for the cache.
One nice thing about the JDO reference implementation is that it can perform queries directly on your objects in memory. So you could literally load your whole database in memory by iterating the extent and then you could perform JDO queries on this data and since it is all in memory, it would be able to execute these queries directly on the in-memory objects. Give it a try...
John Takacs, D.P.M.
Greenhorn

Joined: Nov 15, 2000
Posts: 10
Mike,
MySQL has 4 Blob types, TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. The difference between them is simply the size of the value they can hold.
Yes, I am limiting the size, but it really is based on what the longest url/description/title has been, rather than an arbitrary number.
I really like the results your colleague came up with. That is very interesting. I'll have to remember that as I design my web app.
At the moment my site is actually a metasearch site (one of the first on the web 95/96), but I want to get away from that and offer a directory in addition to the metasearch as I believe the days of metasearch will soon be over.

Originally posted by Mike Farnham:
John,
I am not familiar with MySQL's datastructures. I know Oracle has a number of blobs for storing large strings or binary objects.
Are you specifically limiting the length of a URL to say 255 or maybe 512 characters?
I am only asking mostly out of curiousity. I know the GET has a definite limit. A co-worker and I were just discussing what the limit might be for POST. A quick search using Google revealed a variety of answers. She created a sample URL that was 659 characters long. And, not one of the browsers failed to correctly deal with it.
Cheers,
Mike
 
wood burning stoves
 
subject: JDO real world implementation question