I've been trying to come up with a solution, to a not-so-difficult-problem (maybe?). The issue is, that I want to index some files (actually quite many), but I want to store the vocabulary (set of unique terms) in the main memory with pointer to the db records, which in turn contain other information related to the terms in the vocab.
I would divide the problem/requirements as follows:
1.) I was thinking to use a hashmap, where the value is the term and the key is this pointer to db, one value can have many keys though, the keys in this case are document IDs.
2.) if 1), how scalable then this is? of course vocabulary cannot be that large, but there can exist lots of keys...
3.) How do I persist such hash map when I restart the program? I don't wanna start the indexing all-over again.
4.) Ignore all 3, there exist a better solution? I was thinking to store such terms in DB, and then read the db each time the program has started and recreate this hashmap, but how efficient this is (how fast will it work??)