• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

"circular referencing..." a memory hog??

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi experts! I want to find out if there are potential downfalls in my implementations or if I've violated any Java Performance No-nos.

I've implemented an object X that contains a list of objects Ys. In an object Y, it contains a list of Xs. For instance, X can be a Word object and Y can be a Document object. So X containing Ys means that there is an association between a Word and the Documents. The association is that a Word occurs in different Documents. Similarily, Y containing Xs means that there is a relationship between a Document and the Words. Then the relationship is a document contains several words. I will give two examples how these two associations will be used.

Working with the X-contains-Ys allows me to find out the numbers of the documents does a word occur in. And working with Y-contains-Xs, I'm able to look at what words are in a document.

During the running of a program, I want both objects X and Y to be present in the memory.

Is there a better data-structure that can support that?

Thanks!
 
blacksmith
Posts: 1332
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Circular references are not a problem, as references are not containment.

To illustrate, I can have my a friend's phone number and he can have my phone number, and we've both got 'references' to each other; they're circular, but they are not a problem.
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are two potential problems with this approach:

First, you have duplicated information, which can lead to inconsistencies that are hard to find (for example if you remove a word from a document, but forget to remove the reference from the word).

Second, a reference to one of the documents can prevent other documents from being garbage collected when they share some words.

But depending on the problem you are trying to solve it might still be the best structure to use - the art of software development is to make the right trade-offs...
[ August 14, 2004: Message edited by: Ilja Preuss ]
 
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is by no means unprofessional.

I would keep the document-contains-word relationship but I would model the word-belongsto-document relationship as an index independent of the word and document classes. This would eliminate the list of documents from the word object.
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Catalin Merfu:
I would keep the document-contains-word relationship but I would model the word-belongsto-document relationship as an index independent of the word and document classes. This would eliminate the list of documents from the word object.



This might be a good solution, but I really think we need to know more about the problem to solve to make reasonable suggestions.
 
John Cage
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks all for replying.

Here is the situation in which I will use the "circular refereces". For Word-containing-Documents, I have a list of words. So I can verify if an arbitrary word, X existed. If the word, X existed, I have its documents. For each Document, we can know the words that appear together with the word, X. Hence, it's like a two-part question:

a) Is an arbitrary word, X existed
b) What other words are in the documents with the word, X

Thanks again.
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How will the program be used? Will documents be loaded and removed during runtime? Or is it a more static usage?
 
John Cage
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for being so patient... To answer your question, here it is:

Words and Documents will stay unchanged during the runtime; there is no removal of the elements of Words and Documents. So the relationships between Words and Documents stay the same or unchanged during the runtime.

What you said earlier about the GC: "a reference to one of the documents can prevent other documents from being garbage collected when they share some words" is my main concern about this "circular chaos". I was wondering if there is another way around that.

Maybe I'm way ahead of myself here... I was also wondering what would happen if the scales of input increased to "humongous". Then, is there a way to reallocate VM size based on the input size? And what books or documentations would you recommend on writing the large-scaled applications?

Thanks so much!
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by John Cage:
What you said earlier about the GC: "a reference to one of the documents can prevent other documents from being garbage collected when they share some words" is my main concern about this "circular chaos". I was wondering if there is another way around that.



Well, if the relationship between words and documents is static, there shouln't ever one be gc'ed, anyway, so it's probably not an issue?
 
John Cage
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes!
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic