Today is the start of our promotion in this forum for "Lucene in Action". I'm eagerly awaiting the flood of great questions from the Ranch members
Just to kick things off, I've spent the last few weeks building a "search inside" the book and companion blog at http://www.lucenebook.com which uses lots of web tier and Lucene trickery to combine a blojsom-based blog and a Tapestry-based search page with two Lucene indexes (one for blog content, one for book content). The site is evolving, so any feedback/suggestions you have on it are most welcome.
I'm tickled with Lucene on my Wiki, but I never quite figgered out how to re-index one file when it changes. Lucene has a method to remove a file by index. Do I have to search through Lucene's catalog of files to find mine to get the index? Any simple examples?
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: Dec 30, 2004
Lucene is really a library for full-text indexing and searching, and not an application that knows how to index Wikis, files, databases, or the Web. So you are probably either using a Lucene demo to index some local files, or you wrote your own indexing application on top of Lucene. What you should probably do is make sure that your Lucene Documents contain a Field with the file path in it. This Field should be indexed and not tokenized (Field.Keyword). Then, when you detect that a file has changed you need to remove it from the Lucene index, and re-add it. To remove it from the index you'll do something like this:
Term term = new Term("pathFieldNameHere", "yourFilesPathHere"); IndexReader reader = IndexReader.open("/path/to/your/index"); int deletedCount = reader.delete(term);
This should always delete only 1 Document - the one that matches your file - because each file has a unique path.
After this you'll have to readd your Document the usual way - via IndexWriter's addDocument(Document) method.
If you are doing all this by just using the Lucene demo, you are not really using Lucene the right way, and you are veeeery far from using it fully. To get going I suggest you look at the 2 sample chapters available at Lucene in Action's site - http://www.lucenebook.com/ . You can also download the free sample code from Manning's site or just get the p/ebook (print version gets you the ebook version for free).
Again couple of basic questions. While developing a typical web application that comprises of basic crud on entities, workflows and few search facilities on the existing entities, would I be using Lucene?. If i were to search for something stored in a database, i would use the database indexing to fetch the data for me in an efficient manner.
But say for eg am developing a software for a recruiter and i want to pull out resumes that say has the skill 'Lucene' mentioned in it , I should probably use Lucene?.
Please give us instances where you used lucene in projects. That will give us some idea of how to put lucene to work.
Joined: Dec 30, 2004
Yes, if you wanted to search for a word inside a large chunk of text (e.g "lucene" in a collection of resumes), you will want to use Lucene instead of LIKE '%lucene%' . This is just the most basic example, really. Lucene's site contains a page with currently available search syntax.
You can look at http://www.simpy.com/ for an example of a Lucene-powered site for social bookmarking. If you are building something that includes resumes, then you may be interested in http://www.indeed.com/, a job site for which I helped build the Lucene prototype - the site uses Lucene for resume searches. Finally, the Lucene in Action site, http://www.lucenebook.com/, makes very nice use of Lucene. It's meant to be used in combination with either ebook or pbook, but you can try it out even if you don't have the book, just to get a feel for what Lucene can do.
I was wondering if the authors had any information on the extent to which Lucene has been successfully deployed in large non-commerical or commerical applications. Pointers to specific sites would be very much appreciated, but even anecdotal references / testimonials would be helpful.
Thanks very much! [ January 05, 2005: Message edited by: Greg Barish ]