This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I tried in memory indexing with strange results. For moderate sized indexes (<64 MB) it was a very fast solution once the indexes were created. However when I wanted to have 100k records in memory I needed to use about 120 MB (don't ask why). I used the java -server -Xmx312M switch and beefed up the heap mem to 312 MB for the JVM. And everything behaved strangely. Either the OS (Win XP) was swapping or I swear to god something strange was going on. First off I return max 100 records due to iterators. I could do a search on an index (say name:"S" city:"L") and it returned in 20 ms (gr8 for 100k records). Then I did a search on ("S" and "Los") and the search took 10 seconds! However just reading the next100 matches (20 ms) I could reach the same matches faster ?! This strange behaviour made me drop in-memory indexes and leave it to the pros (Oracle, MySQL etc). Does anyone have a clue what happened? I have 512 MB main mem. My guess is either garbage collection or OS virtual memory swapping. Can it have something to do with crossing 64MB boundaries?
Any ideas anyone? Not that it really matters for my assignment anymore, but I am curious by nature, and this is a question w.o. an answer so far.
Joakim, 320Mb heap is huge for such small application. The penality you pay is you will suffer slow down in performance once in a while. The slow down in performance is because the garbage collection. The GC will not run until it runs out of memory. In your case, 320MB heap, it will take long time to run out of memory and long time to clean it. Other thing that you should look at is your search algorithem. One more thing but I am not sure about: you said you have 512MB of physical memory, if you allocate 320 MB just for the heap, I think the OS will starv for resources, and hence if the OS want to do something it will take long time, and then the JVM will have to wait for the CPU cycle. I am not a System Engineer, however this is what I think from my experiece.
I am interested to know what you will find out.
SCJD 1.4<br />SCJP 1.4<br />-----------------------------------<br />"With regard to excellence, it is not enough to know, but we must try to have and use it.<br />" Aristotle
Joined: Sep 13, 2004
Well um, - yes its a lot of memory for this app, but I kept 100,000 records in memory (each object taking about 1k) sorted in 4 different indexes. The size in memory of a record is larger than its size on the disk. I started out using Strings and had about 2k /object. I changed to primitives and char arrays and got down to 1k/object. Using byte arrays I could probably squeeze down each object even more. But as each record consists of multiple objects (or primitives and arrays) there is a lot of overhead. (internal pointers to the fields of each record) Probably just storing the original byte array as the object would take the least amount of RAM. (I did not try that)
I had one ArrayList indexed on recNo storing the FullRecord and then indexes as TreeSets using the FullRecord as key, and creating the indexes as I inserted records (references to already created objects) into them. The problem with this approach was that it was difficult to search on the partial matches according to specs.
That made me finally use arrays for the indexes and the java.util.Arrays.sort(array, Comparator) to create the index and java.util.Arrays.binarySearch(key,Comparator) to search. It was this binarySearch that had the strange behaviour.
Maybe it is gc going on, but most likely it is something else, as I never hit the 384 MB limit. (the app never exceeds 150 MB).
Anyways resource starvation might have influence... that could be it. XP might start swapping to disk to manage my heap requirements, as my virtual mem (about 1024 MB) is larger than my RAM (512 MB).