File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes OO, Patterns, UML and Refactoring and the fly likes Runtime intermediate storage - Design Problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » OO, Patterns, UML and Refactoring
Bookmark "Runtime intermediate storage - Design Problem" Watch "Runtime intermediate storage - Design Problem" New topic
Author

Runtime intermediate storage - Design Problem

Varun Chopra
Ranch Hand

Joined: Jul 10, 2008
Posts: 211
In a project we are working on, there is large amount of data in play at runtime. There are about 40 domain classes corresponding to different chunks of this data. We have a search module which is pretty heavy and needs to work with most of these objects throughout the user session. Existing application does all of this with objects in session memory. It loads data in all these objects, goes through various stages working with these and gives a final result. It is not that bad actually but because it needs all the data in memory so more RAM is required during search operations. There is not object caching or serialization going on. All data is in session memory.

We are in the process of re-designing this part of application. A part of our team has come up with idea of using HSQL db as intermediate storage for data. This means we will have de-normalized tables representing intermediate results of search results instead of objects. In other words, suppose search does job X1, X2 and X3 before it is done. If X1 part works with 15 objects and comes up with intermediate result R1, this R1 will be flattened into a database table. These 15 objects will be discarded. X2 will fetch data from R1 tables to do its job and its result R2 will be flattened in database as R2. X3 will work with R3 and finally delete parts of R1, R2 and R3 data that is not needed for rest of the user session. Their thinking is:

1) This will reduce the complexity of search operation because it is easier to work with denormalized relational data instead of heirarchy of java objects
2) This may reduce load on server because we will use some kind of caching with HSQLDB and it ill take care of serializing data not in use

Counter-argument by some other team members are:

1) We will still need object structure because when we have to use data from R1 and R2 we will have to keep it in objects
2) It may actually increase the load on server and need much more RAM because HSQLDB will consume a lot more memory than bare Java objects and will also be much slower (based on our little tests we performed)

In stead we are suggesting to use session scope for Spring beans (since we are migrating to Spring) and use caching like Apache cache or ehcache to take care of memory utilization.

What will you suggest in this case? If you have alternative suggestions please share.


-Varun -
(My Blog) - Online Certifications - Webner Solutions
Mohamed Sanaulla
Saloon Keeper

Joined: Sep 08, 2007
Posts: 3068
    
  33

Varun Chopra wrote:
...
We are in the process of re-designing this part of application. A part of our team has come up with idea of using HSQL db as intermediate storage for data. This means we will have de-normalized tables representing intermediate results of search results instead of objects. In other words, suppose search does job X1, X2 and X3 before it is done. If X1 part works with 15 objects and comes up with intermediate result R1, this R1 will be flattened into a database table. These 15 objects will be discarded. X2 will fetch data from R1 tables to do its job and its result R2 will be flattened in database as R2. X3 will work with R3 and finally delete parts of R1, R2 and R3 data that is not needed for rest of the user session. Their thinking is:
...

Here are a few of my observations:

  • As the GC activity would happen based on the available/required memory and handled by JVM, I dont think there's a way to discard 15 objects other than just referencing them to null. But the objects continue to exist on the heap until a GC happens. So suppose a GC happens immediately after you have deferenced these references then fine, otherwise these 45 objects or so would continue to occupy the heap space. (This may not turn out to be an issue everytime, but can be hard to track memory leak issues in future when the load increases)
  • Again flattening the objects data into tables is something like writing a mini ORM mapper for HSQLDB and processing that might take some time. And it will increase the complexity of the system. Agree that searching through the aggregation of objects is confusing, requires lot more code.
  • I dont know how the caching of data by HSQLDB would be different than that of Java object caching. Moreover a search using a SQL and a DB would also require its share of memory and its share of processing time. It doesn't indicate that using a DB would help you to reduce the memory. Also once you fetch the results from the DB you would have to do some processing to create Java object for using in the UI layer and in the end you would end up creating objects+hsqldb in memory.
  • Having a mini db in the memory would surely consume more memory than just java objects. Imagine a situation where GC hasnt happened so you have your all the java objects (which are no longer used) in the memory along with the db.

  • Feel free to comment/override/oppose my views


    Mohamed Sanaulla | My Blog
    Varun Chopra
    Ranch Hand

    Joined: Jul 10, 2008
    Posts: 211
    Thanks for replying Mohamed.
    I agree with your view. There is no guarantee when the memory will be released.
    So what is your suggestion for design then? Should we use session scoped POJOs and use some caching engine like Apache JCS to serialized passive objects to take care of memory consumption?
    Murali Ranga
    Ranch Hand

    Joined: Dec 16, 2011
    Posts: 38
    I have few questions?
    Are they refining the search on the same data?
    Means R1 data is refined to R2 and R2 is refined to R3?
    or are they getting new set of data(R2) from different tables taking R1 as search criteria/input?

    I am not sure how much time is taking for the actual query to fetch the data from database?
    If the query is returning the results as per expected response time then keep the search criteria/input in the session .
    Every time access the database based on the search criteria means build the query at runtime
    like s1 ,s2 ,s3 as search criteria objects in the session
     
     
    subject: Runtime intermediate storage - Design Problem