• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

indexing files

 
archit thakur
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hey
I am working on the project of Desktop Search application that searches the contents of files and folders.
I am facing the problem of indexing folders and files for effective search. I dont want to use lucene.is there some another way of doing that.
Thank you
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I dont want to use lucene.

Why not?
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13058
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Of course there is a way - you end up reinventing the wheel but you will learn a lot.

You will need to invent:

1. method for identifying files and relating the identity to:
2. parsing words or phrases or whatever you want to index on and:
3. creating a dictionary of words with links to the files they were found in.
(its called an inverted index)
4. creating a lookup mechanism to get from user input to the words in the index and eventually to the files.

Since my first try at this in 1979 (ah CPM, the good old days), I have found multiple solutions to the above - it is very educational.

Bill

 
archit thakur
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:
I dont want to use lucene.

Why not?


this is my final year project, i want to develop something by my own.using lucene is just like developing only the gui.
 
archit thakur
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
William Brogden wrote:Of course there is a way - you end up reinventing the wheel but you will learn a lot.

You will need to invent:

1. method for identifying files and relating the identity to:
2. parsing words or phrases or whatever you want to index on and:
3. creating a dictionary of words with links to the files they were found in.
(its called an inverted index)
4. creating a lookup mechanism to get from user input to the words in the index and eventually to the files.

Since my first try at this in 1979 (ah CPM, the good old days), I have found multiple solutions to the above - it is very educational.

Bill


Thank you sir,
could you also please tell me the good reference material for it(book or some published papers).
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13058
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, the bibliography in the wikipedia entry on inverted indexes I cited would be a good start.

I don't know of any specific book.

Bill
 
David O'Meara
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dzone.com has a series of refcards (simple printable pages outlining various technologies) and one is on Lucene.
This has a high level coverage of how Lucene performs document indexing and may be a good place to start.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic