aspose file tools
The moose likes Beginning Java and the fly likes best performance using minum memory Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Reply Bookmark "best performance using minum memory " Watch "best performance using minum memory " New topic
Author

best performance using minum memory

Edward Chen
Ranch Hand

Joined: Dec 23, 2003
Posts: 758
I have an interview question I ever had.

We have a HUGE plain txt dictionary file, like
Life ddddd
Live:eeeeee

Now we need to build a dictionary application, search a word a user inputs, fetch the meaning back to user. Assumption: the file size is HUGE (40GB) we can't load all data into memory.

The question is , how to achieve the best performance using minum memory size ? Using what alogrithm and what Collection object ?

Have any idea?

Thanks.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 9939
    
    6

a lot of times, they don't want a specific answer, but just want to see how you approach the problem. are you aware of tradeoffs you are making? are you making bogus assumptions? can you defend whatever answer you give?

often these are more important that the exact algorithm you give.


Never ascribe to malice that which can be adequately explained by stupidity.
Edward Chen
Ranch Hand

Joined: Dec 23, 2003
Posts: 758
Here is my thinking. Any comments are welcome.

Let me assume
1. the dictionary file is in order. (of course)
2. each word is just one line.

Here we only have one choice: randomAccessFile. So, we build an index file for that dictionay.
Dic['A']="100-200"; // a word begining with "A" is located from line 100 to line 200.
Dic['A']['A']="100-150"; // a word begining with "AA" is located from line 100-150.
So we have a 26X26 array, first time, we need to scan the huge file, but second time, we could serialize this array into a file.

User search a word, we just need the first two, then get array data, then use randomaccess file to getLine() to match.

///////////////////////////////////////
Here a problem left, how to scan it ?

Let me assue we have the total line numbers of this file.
We use multithread, for example, we cut wholes files into n parts , each of them will be 100 lines (n*100=total_lines). Then we create n thread to randomaccess file.

Then we use an algorithm like binary search, for example,
1. thread points to 2*100 line, is "Life".
2. "List" is not begining of "L", so we go back to 200-100 line,
3. the word in 200-100 line is "Knife", so we go forward 100/2,
4. repeat, repeat...
5. finally, we could get "L" begining line number.

Just a thinking. what is your idea ? Thanks.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: best performance using minum memory
 
Similar Threads
Java Heap dump Size and Resident Memory(RES) Size
Java Heap dump Size and Resident Memory(RES) Size
XML and stored procedures using Java
Help! How to design this mutithreading application??
LZ78 and FileReader