wood burning stoves*
The moose likes Java in General and the fly likes indexing files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "indexing files " Watch "indexing files " New topic
Author

indexing files

archit thakur
Greenhorn

Joined: Jul 10, 2010
Posts: 24
hey
I am working on the project of Desktop Search application that searches the contents of files and folders.
I am facing the problem of indexing folders and files for effective search. I dont want to use lucene.is there some another way of doing that.
Thank you
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42367
    
  64
I dont want to use lucene.

Why not?


Ping & DNS - my free Android networking tools app
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12809
    
    5
Of course there is a way - you end up reinventing the wheel but you will learn a lot.

You will need to invent:

1. method for identifying files and relating the identity to:
2. parsing words or phrases or whatever you want to index on and:
3. creating a dictionary of words with links to the files they were found in.
(its called an inverted index)
4. creating a lookup mechanism to get from user input to the words in the index and eventually to the files.

Since my first try at this in 1979 (ah CPM, the good old days), I have found multiple solutions to the above - it is very educational.

Bill

archit thakur
Greenhorn

Joined: Jul 10, 2010
Posts: 24
Ulf Dittmer wrote:
I dont want to use lucene.

Why not?


this is my final year project, i want to develop something by my own.using lucene is just like developing only the gui.
archit thakur
Greenhorn

Joined: Jul 10, 2010
Posts: 24
William Brogden wrote:Of course there is a way - you end up reinventing the wheel but you will learn a lot.

You will need to invent:

1. method for identifying files and relating the identity to:
2. parsing words or phrases or whatever you want to index on and:
3. creating a dictionary of words with links to the files they were found in.
(its called an inverted index)
4. creating a lookup mechanism to get from user input to the words in the index and eventually to the files.

Since my first try at this in 1979 (ah CPM, the good old days), I have found multiple solutions to the above - it is very educational.

Bill


Thank you sir,
could you also please tell me the good reference material for it(book or some published papers).
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12809
    
    5
Actually, the bibliography in the wikipedia entry on inverted indexes I cited would be a good start.

I don't know of any specific book.

Bill
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

Dzone.com has a series of refcards (simple printable pages outlining various technologies) and one is on Lucene.
This has a high level coverage of how Lucene performs document indexing and may be a good place to start.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: indexing files