File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Matrix form of files and words Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Matrix form of files and words" Watch "Matrix form of files and words" New topic
Author

Matrix form of files and words

K
Greenhorn

Joined: Aug 03, 2006
Posts: 2
hi

I want to create a matrix form of files and words in each file i.e., like
files as rows, words as columns.

matrix [ file ][ word ] = (frequency of word in the file)

Is there any logic to do this ?
I tried to create with arrays and hashmaps...
Any help would be highly appreciated.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
For parsing text files into words - see the java.io.StreamTokenizer class

In order for the columns to make any sense, it seems to me that you need to start with a dictionary of words to be recognized - all other words to be ignored.

A Hashmap can be used to look up the column number corresponding to a word.

Bill
K
Greenhorn

Joined: Aug 03, 2006
Posts: 2
Thanks for your reply Bill.

I already have the filtered lists of words and their frequencies of each file.
But I'm confused in representing them as a matrix form.

If you can give me a sample code, that will be great.
Do you have any idea about vector space model ?

Thanks a lot.
Krish
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
When you say "Matrix form" - what do you mean? Do you have to use some specific matrix math package or are you just looking for a convenient display?
Perhaps this tutorial on arrays of arrays will help.
Bill
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Use REXX Associative Arrays are pretty cool.

In Java I'd look into a map keyed by filename holding maps keyed by words. Or the other way around.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Matrix form of files and words