aspose file tools*
The moose likes Java in General and the fly likes indexing files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "indexing files " Watch "indexing files " New topic
Author

indexing files

archit thakur
Greenhorn

Joined: Jul 10, 2010
Posts: 24
hey
I am working on the project of Desktop Search application that searches the contents of files and folders.
I am facing the problem of indexing folders and files for effective search. I dont want to use lucene.is there some another way of doing that.
Thank you
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39547
    
  27
I dont want to use lucene.

Why not?


Ping & DNS - updated with new look and Ping home screen widget
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12676
    
    5
Of course there is a way - you end up reinventing the wheel but you will learn a lot.

You will need to invent:

1. method for identifying files and relating the identity to:
2. parsing words or phrases or whatever you want to index on and:
3. creating a dictionary of words with links to the files they were found in.
(its called an inverted index)
4. creating a lookup mechanism to get from user input to the words in the index and eventually to the files.

Since my first try at this in 1979 (ah CPM, the good old days), I have found multiple solutions to the above - it is very educational.

Bill


Java Resources at www.wbrogden.com
archit thakur
Greenhorn

Joined: Jul 10, 2010
Posts: 24
Ulf Dittmer wrote:
I dont want to use lucene.

Why not?


this is my final year project, i want to develop something by my own.using lucene is just like developing only the gui.
archit thakur
Greenhorn

Joined: Jul 10, 2010
Posts: 24
William Brogden wrote:Of course there is a way - you end up reinventing the wheel but you will learn a lot.

You will need to invent:

1. method for identifying files and relating the identity to:
2. parsing words or phrases or whatever you want to index on and:
3. creating a dictionary of words with links to the files they were found in.
(its called an inverted index)
4. creating a lookup mechanism to get from user input to the words in the index and eventually to the files.

Since my first try at this in 1979 (ah CPM, the good old days), I have found multiple solutions to the above - it is very educational.

Bill


Thank you sir,
could you also please tell me the good reference material for it(book or some published papers).
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12676
    
    5
Actually, the bibliography in the wikipedia entry on inverted indexes I cited would be a good start.

I don't know of any specific book.

Bill
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

Dzone.com has a series of refcards (simple printable pages outlining various technologies) and one is on Lucene.
This has a high level coverage of how Lucene performs document indexing and may be a good place to start.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: indexing files
 
Similar Threads
File Indexing library
lucene estimate index size, search time
grep question
Strange behavior of Tomcat! "classFile.delete() failed"
searching file among 3 hundred thousand files