a lot of times, they don't want a specific answer, but just want to see how you approach the problem. are you aware of tradeoffs you are making? are you making bogus assumptions? can you defend whatever answer you give?
often these are more important that the exact algorithm you give.
There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Joined: Dec 23, 2003
Here is my thinking. Any comments are welcome.
Let me assume 1. the dictionary file is in order. (of course) 2. each word is just one line.
Here we only have one choice: randomAccessFile. So, we build an index file for that dictionay. Dic['A']="100-200"; // a word begining with "A" is located from line 100 to line 200. Dic['A']['A']="100-150"; // a word begining with "AA" is located from line 100-150. So we have a 26X26 array, first time, we need to scan the huge file, but second time, we could serialize this array into a file.
User search a word, we just need the first two, then get array data, then use randomaccess file to getLine() to match.
/////////////////////////////////////// Here a problem left, how to scan it ?
Let me assue we have the total line numbers of this file. We use multithread, for example, we cut wholes files into n parts , each of them will be 100 lines (n*100=total_lines). Then we create n thread to randomaccess file.
Then we use an algorithm like binary search, for example, 1. thread points to 2*100 line, is "Life". 2. "List" is not begining of "L", so we go back to 200-100 line, 3. the word in 200-100 line is "Knife", so we go forward 100/2, 4. repeat, repeat... 5. finally, we could get "L" begining line number.