Hi All, We have developed an RMI Based search engine, which provides the facility to search as per the categories, Boolean and phrase search, which is being implemented in the following way: - Books are divided into batches. Each batch has 50 books. In each Batch 37 files which are indexes i.e. a to z and 0 to 9 and 1 files for storing the Unicode values. A Book Registry is maintained to keep track of Book and Batch relation. A Publisher Registry is maintained to keep track of Books relation to a Publisher, so that search can be done Publisher wise. 8 digit hex number represents a Book, to avoid storing 8-digit number in front of each word an alias of 3 digit is created and stored in the index. Index of books is stored on Random Access Files. For creating and accessing those random access files, third party API has been used. We had implemented the Random Access Files as per the URL http://www.javaworld.com/javaworld/jw-01-1999/jw-01-step.html When the RMI Search Server is started all the indexes are loaded into the memory. The reason for doing this is for faster retrieval of results. The problem that we are facing now is as the number of books keep on increasing so do the number of indexes and when we are trying to load the indexes into the memory we are getting the OutOfMemoryError. The server where I am running the code is a Linux Server with a RAM of 1GB and a swap space of 1GB. And to run the program I am specifying the options as �Xmx700m and �Xms700m and after loading 81 batches i.e around 4500 books I am getting the OutOfMemoryError. Any ideas regarding the same will be very helpful. Prasanth.
When the RMI Search Server is started all the indexes are loaded into the memory. The reason for doing this is for faster retrieval of results. The problem that we are facing now is as the number of books keep on increasing so do the number of indexes and when we are trying to load the indexes into the memory we are getting the OutOfMemoryError.
This is a typical engineering tradeoff. You want to load indexes into memory to increase search performance, but you don't have enough memory to load the indexes. I'd say you have two options: One, rather than load the index files into memory, use RandomAccessFile and the index files on disk to do binary searches on index values (this may take extra work to keep the index files in sorted order). There will be a bit of a performance hit over having the indexes in memory, but still fast for a homebrew solution. Two, move your data to a SQL database. Searching on large datasets is what SQL databases do well.