I am in serious need for a solution for creating an inverted index for a information retrieval project. The program works by reading ten text files and accepting a user query (which I have already done).
Has anyone done Information retrieval in java before?
Do I have to create Hash Maps or Tree Sets or ArrayList to do Vector Representation.
I need to store: ----------------- Index Term (String) , Doc Frequency (int) , Document Number/ID (int), Term Frequency (int).
I'm not really sure what you're asking with the whole inverted index thing. But I can tell you how I'd probably tackle the overall problem.
For each file, I'd read it and create a Map with the term as the key and their frequency as the value.
If you have a Map for each file it should be pretty easy to figure out the document frequency (by using containsKey() on each Map). The term frequency would just be the sum of the value retrieved from each Map for the given key.
Joined: Feb 16, 2006
I have taken your advice, I have done this:
There is a for loop to read ten text files and at the moment every term is saved into one tree map called frequency data.
I need to work out how to create a new Tree map for each document.
Should this be in the for loop or should I create a certain number of treemaps based on how documents there are?