Win a copy of Svelte and Sapper in Action this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

OutOfMemoryError While loading the indexes

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
We have developed an RMI Based search engine, which provides the facility to search as per the categories, Boolean and phrase search, which is being implemented in the following way: -
Books are divided into batches. Each batch has 50 books.
In each Batch 37 files which are indexes i.e. a to z and 0 to 9 and 1 files for storing the Unicode values.
A Book Registry is maintained to keep track of Book and Batch relation.
A Publisher Registry is maintained to keep track of Books relation to a Publisher, so that search can be done Publisher wise.
8 digit hex number represents a Book, to avoid storing 8-digit number in front of each word an alias of 3 digit is created and stored in the index.
Index of books is stored on Random Access Files. For creating and accessing those random access files, third party API has been used.
We had implemented the Random Access Files as per the URL http://www.javaworld.com/javaworld/jw-01-1999/jw-01-step.html
When the RMI Search Server is started all the indexes are loaded into the memory. The reason for doing this is for faster retrieval of results.
The problem that we are facing now is as the number of books keep on increasing so do the number of indexes and when we are trying to load the indexes into the memory we are getting the OutOfMemoryError.
The server where I am running the code is a Linux Server with a RAM of 1GB and a swap space of 1GB.
And to run the program I am specifying the options as �Xmx700m and �Xms700m and after loading 81 batches i.e around 4500 books I am getting the OutOfMemoryError.
Any ideas regarding the same will be very helpful.
Prasanth.
 
Bartender
Posts: 9615
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by Prasanth Allam:

When the RMI Search Server is started all the indexes are loaded into the memory. The reason for doing this is for faster retrieval of results.
The problem that we are facing now is as the number of books keep on increasing so do the number of indexes and when we are trying to load the indexes into the memory we are getting the OutOfMemoryError.


This is a typical engineering tradeoff. You want to load indexes into memory to increase search performance, but you don't have enough memory to load the indexes. I'd say you have two options:
One, rather than load the index files into memory, use RandomAccessFile and the index files on disk to do binary searches on index values (this may take extra work to keep the index files in sorted order). There will be a bit of a performance hit over having the indexes in memory, but still fast for a homebrew solution.
Two, move your data to a SQL database. Searching on large datasets is what SQL databases do well.
 
Ranch Hand
Posts: 688
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Without getting into detail math related question, what is the base size of the index? meaning if you calculated them based on byte, how big is the index?
 
I think he's gonna try to grab my monkey. Do we have a monkey outfit for this tiny ad?
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic