This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Performance and the fly likes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Performance
Bookmark ""circular referencing..." a memory hog??" Watch ""circular referencing..." a memory hog??" New topic
Author

"circular referencing..." a memory hog??

John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
Hi experts! I want to find out if there are potential downfalls in my implementations or if I've violated any Java Performance No-nos.

I've implemented an object X that contains a list of objects Ys. In an object Y, it contains a list of Xs. For instance, X can be a Word object and Y can be a Document object. So X containing Ys means that there is an association between a Word and the Documents. The association is that a Word occurs in different Documents. Similarily, Y containing Xs means that there is a relationship between a Document and the Words. Then the relationship is a document contains several words. I will give two examples how these two associations will be used.

Working with the X-contains-Ys allows me to find out the numbers of the documents does a word occur in. And working with Y-contains-Xs, I'm able to look at what words are in a document.

During the running of a program, I want both objects X and Y to be present in the memory.

Is there a better data-structure that can support that?

Thanks!
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Circular references are not a problem, as references are not containment.

To illustrate, I can have my a friend's phone number and he can have my phone number, and we've both got 'references' to each other; they're circular, but they are not a problem.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
There are two potential problems with this approach:

First, you have duplicated information, which can lead to inconsistencies that are hard to find (for example if you remove a word from a document, but forget to remove the reference from the word).

Second, a reference to one of the documents can prevent other documents from being garbage collected when they share some words.

But depending on the problem you are trying to solve it might still be the best structure to use - the art of software development is to make the right trade-offs...
[ August 14, 2004: Message edited by: Ilja Preuss ]

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Catalin Merfu
Ranch Hand

Joined: May 26, 2004
Posts: 42
This is by no means unprofessional.

I would keep the document-contains-word relationship but I would model the word-belongsto-document relationship as an index independent of the word and document classes. This would eliminate the list of documents from the word object.


Catalin Merfu<br /><a href="http://www.accendia.com" target="_blank" rel="nofollow">High Performance Java Networking</a>
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Catalin Merfu:
I would keep the document-contains-word relationship but I would model the word-belongsto-document relationship as an index independent of the word and document classes. This would eliminate the list of documents from the word object.


This might be a good solution, but I really think we need to know more about the problem to solve to make reasonable suggestions.
John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
Thanks all for replying.

Here is the situation in which I will use the "circular refereces". For Word-containing-Documents, I have a list of words. So I can verify if an arbitrary word, X existed. If the word, X existed, I have its documents. For each Document, we can know the words that appear together with the word, X. Hence, it's like a two-part question:

a) Is an arbitrary word, X existed
b) What other words are in the documents with the word, X

Thanks again.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
How will the program be used? Will documents be loaded and removed during runtime? Or is it a more static usage?
John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
Thanks for being so patient... To answer your question, here it is:

Words and Documents will stay unchanged during the runtime; there is no removal of the elements of Words and Documents. So the relationships between Words and Documents stay the same or unchanged during the runtime.

What you said earlier about the GC: "a reference to one of the documents can prevent other documents from being garbage collected when they share some words" is my main concern about this "circular chaos". I was wondering if there is another way around that.

Maybe I'm way ahead of myself here... I was also wondering what would happen if the scales of input increased to "humongous". Then, is there a way to reallocate VM size based on the input size? And what books or documentations would you recommend on writing the large-scaled applications?

Thanks so much!
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by John Cage:
What you said earlier about the GC: "a reference to one of the documents can prevent other documents from being garbage collected when they share some words" is my main concern about this "circular chaos". I was wondering if there is another way around that.


Well, if the relationship between words and documents is static, there shouln't ever one be gc'ed, anyway, so it's probably not an issue?
John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
Yes!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: "circular referencing..." a memory hog??
 
Similar Threads
Class Diagram -Association & Inheritance
DetachedCriteria returns duplicate results for one-to-many association
Seperating an Image.
question regarding association
uni-directional association (or Directed Association ) vs dependency