• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Matrix form of files and words

 
K
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi

I want to create a matrix form of files and words in each file i.e., like
files as rows, words as columns.

matrix [ file ][ word ] = (frequency of word in the file)

Is there any logic to do this ?
I tried to create with arrays and hashmaps...
Any help would be highly appreciated.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13058
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For parsing text files into words - see the java.io.StreamTokenizer class

In order for the columns to make any sense, it seems to me that you need to start with a dictionary of words to be recognized - all other words to be ignored.

A Hashmap can be used to look up the column number corresponding to a word.

Bill
 
K
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your reply Bill.

I already have the filtered lists of words and their frequencies of each file.
But I'm confused in representing them as a matrix form.

If you can give me a sample code, that will be great.
Do you have any idea about vector space model ?

Thanks a lot.
Krish
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13058
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When you say "Matrix form" - what do you mean? Do you have to use some specific matrix math package or are you just looking for a convenient display?
Perhaps this tutorial on arrays of arrays will help.
Bill
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use REXX Associative Arrays are pretty cool.

In Java I'd look into a map keyed by filename holding maps keyed by words. Or the other way around.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic