wood burning stoves 2.0*
The moose likes Beginning Java and the fly likes Hash Map to do Inverted Index? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Hash Map to do Inverted Index?" Watch "Hash Map to do Inverted Index?" New topic
Author

Hash Map to do Inverted Index?

Anthony Alexander
Greenhorn

Joined: Feb 16, 2006
Posts: 15
Hello

I am in serious need for a solution for creating an inverted index for a information retrieval project. The program works by reading ten text files and accepting a user query (which I have already done).

Has anyone done Information retrieval in java before?

Do I have to create Hash Maps or Tree Sets or ArrayList to do Vector Representation.

I need to store:
-----------------
Index Term (String) ,
Doc Frequency (int) ,
Document Number/ID (int),
Term Frequency (int).


An example of output should be:

IndexTerm DocFreq DocNum TermFreq
-------------------------------------------------
java 3 1 5 67 2 5 12
machine 2 22 44 17 3



So this means java appears in three documents (In doc one 2 times, doc five 5 times, etc).


Thanks for your time
Anthony Alexander
Greenhorn

Joined: Feb 16, 2006
Posts: 15
Sorry table should be like this:



eg.
machine appears in 2 documents (22 and 44, 17 times and 3 times respectively).
Martin Mathis
Ranch Hand

Joined: Dec 20, 2004
Posts: 45
I'm not really sure what you're asking with the whole inverted index thing. But I can tell you how I'd probably tackle the overall problem.

For each file, I'd read it and create a Map with the term as the key and their frequency as the value.

If you have a Map for each file it should be pretty easy to figure out the document frequency (by using containsKey() on each Map). The term frequency would just be the sum of the value retrieved from each Map for the given key.
Anthony Alexander
Greenhorn

Joined: Feb 16, 2006
Posts: 15
I have taken your advice, I have done this:



There is a for loop to read ten text files and at the moment every term is saved into one tree map called frequency data.

I need to work out how to create a new Tree map for each document.

Should this be in the for loop or should I create a certain number of treemaps based on how documents there are?

Thanks for the help,
Anthony Alexander
Greenhorn

Joined: Feb 16, 2006
Posts: 15
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
You could use an array, or ArrayList of Maps to store the results and advance the index each time you switch documents.

[ February 18, 2006: Message edited by: Garrett Rowe ]

Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter
Anthony Alexander
Greenhorn

Joined: Feb 16, 2006
Posts: 15
Good idea but I get error:

Cannot create a generic array of Map<String,Integer>


I'm gonna try and see what the notation is for creating an array of tree maps.
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Sorry about that, I forgot about the restriction about creating generic arrays. Try something like this as this code will compile fine.
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
As an interesting sidebar (interesting to me anyway) this is also legal:

I'm sure it all has somthing to do with the way the Java compiler interprets generics, but I cant remember the logic behind it now.
[ February 18, 2006: Message edited by: Garrett Rowe ]
Anthony Alexander
Greenhorn

Joined: Feb 16, 2006
Posts: 15
Thanks you have been a great help.


I am now working on having a tree set within a separate treemap (i'm not sure if that makes sense), as I need to have a list of numbers for each word.

At the moment I have an array of ten treemaps for each document containing the term (word) and frequency. <String, int>

The next stage is to create another normal tree map which contains term and list of document numbers it appears in. <String, List int?>

eg.
term list
----- -----
[java] [0] [3] [4]
[the] [1] [8]

java appears in docs 0,3,4.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Hash Map to do Inverted Index?
 
Similar Threads
HashMap
TreeMap question#1
my inverted index not successfull
Hashtable of hashtables
used lucene?