Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Lucene beginner question

 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I'm trying a simple search (based on Lucene in 5 minutes) on an existing index with the following code but it doesn't return any hits. What am I doing wrong? I tried various search strings and they definitely exist in the indexed documents title, or text, but it always comes back with zero hits.

Thanks!
Eddy

 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Without knowing what data you're putting into the index I'm not really sure how to answer, since I have nothing I can run the code against.
 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi David,

Thanks for replying. I'm simply copying a few text documents into a directory, and then creating the index using the class from Lucene in 5 Minutes. It seems to be doing the job correctly and an index is created in my file system.

I'm copying the index generation code below. I tried attaching the files I'm using as data but I'm getting error message claiming I can't attach a .txt file here. But, again, any data will do. I created an index using various text files, and html's, but the search isn't working.

Thanks,
Eddy

 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're searching for a field called "title", but the index only contains the fields "pathname" and "contents".

Lucene comes with a tool called "Lucli" that can be used to query existing indices from the command-line (it's in the "contrib" folder). Luke is an even more powerful GUI app for working with indices.
 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Lester!

I got it. Now, how do I look for the actual occurrences of the string in the "contents" of the file? I'm sure there's a definitive tutorial, or example out there. Is there? Do you know where is it?

Thanks again,
Eddy
 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Either

new QueryParser(Version.LUCENE_CURRENT, "title", analyzer).parse("contents:"+querystr);

(assuming that 'querystr' contains a single word), or

new QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse(querystr);

should do it. The "Query Syntax" page on the Lucene web site (it's also in the "docs" directory of the download) describes in more detail how to create queries; is that what you're asking?

Lastly, if you're serious about Lucene, do yourself a favor and get "Lucene in Action", 2nd. ed. It will save you a lot of time by explaining all the ins and outs of indexing and searching. There's a lot one really needs to know that's not well explained anywhere online (and certainly not all in one place).
 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very very much, Lester. You've been a great help.

Eddy
 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, one more question, while I contemplate spending the money for Lucene in Action. Lets say I look for the word "Manhattan" in my text files, and the search works and I get 3 results, sorted in one way or another. Now I want the sentences (or another string long enough to give context) inside each of the documents, in which the word "Manhattan" appears. And the number of times it appears. Is that possible in Lucene?

Thanks,
Eddy
 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's the realm of the Lucene Highlighter (also in the "contrib" folder). It can extract context around a search hit and mark it up in some way. To do that it needs the full text that has been indexed, though, which isn't generally stored in the index (for example, in your code only the "path" field is stored, not the "contents" field, which is probably the right thing to do).

Alternatively, the search code could use the information that *is* stored in the index -the path- to retrieve the document, extract its text, and then run a Highlighter over it to get at the context of the search hits.
 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Cool, I got it.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I second the recommendation for the book; before it existed a lot of stuff was trial-and-error through the API. This works, and it's *very* educational, but it's also inefficient. Some of the newer stuff is on the complex side, too, and the book goes a long way towards making it digestible.
 
eddy johns
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's a good conversation about highlighter at http://hrycan.com/2009/10/25/lucene-highlighter-howto/, with an excellent code sample that just works. Look at the last comment for the solution on how to compile it under Lucene 3.

Cheers everyone,
Eddy
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic