aspose file tools*
The moose likes Other Open Source Projects and the fly likes Lucene beginner question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Lucene beginner question" Watch "Lucene beginner question" New topic
Author

Lucene beginner question

eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
Hi all,

I'm trying a simple search (based on Lucene in 5 minutes) on an existing index with the following code but it doesn't return any hits. What am I doing wrong? I tried various search strings and they definitely exist in the indexed documents title, or text, but it always comes back with zero hits.

Thanks!
Eddy

David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Without knowing what data you're putting into the index I'm not really sure how to answer, since I have nothing I can run the code against.
eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
Hi David,

Thanks for replying. I'm simply copying a few text documents into a directory, and then creating the index using the class from Lucene in 5 Minutes. It seems to be doing the job correctly and an index is created in my file system.

I'm copying the index generation code below. I tried attaching the files I'm using as data but I'm getting error message claiming I can't attach a .txt file here. But, again, any data will do. I created an index using various text files, and html's, but the search isn't working.

Thanks,
Eddy

Lester Burnham
Rancher

Joined: Oct 14, 2008
Posts: 1337
You're searching for a field called "title", but the index only contains the fields "pathname" and "contents".

Lucene comes with a tool called "Lucli" that can be used to query existing indices from the command-line (it's in the "contrib" folder). Luke is an even more powerful GUI app for working with indices.
eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
Thanks Lester!

I got it. Now, how do I look for the actual occurrences of the string in the "contents" of the file? I'm sure there's a definitive tutorial, or example out there. Is there? Do you know where is it?

Thanks again,
Eddy
Lester Burnham
Rancher

Joined: Oct 14, 2008
Posts: 1337
Either

new QueryParser(Version.LUCENE_CURRENT, "title", analyzer).parse("contents:"+querystr);

(assuming that 'querystr' contains a single word), or

new QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse(querystr);

should do it. The "Query Syntax" page on the Lucene web site (it's also in the "docs" directory of the download) describes in more detail how to create queries; is that what you're asking?

Lastly, if you're serious about Lucene, do yourself a favor and get "Lucene in Action", 2nd. ed. It will save you a lot of time by explaining all the ins and outs of indexing and searching. There's a lot one really needs to know that's not well explained anywhere online (and certainly not all in one place).
eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
Thank you very very much, Lester. You've been a great help.

Eddy
eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
OK, one more question, while I contemplate spending the money for Lucene in Action. Lets say I look for the word "Manhattan" in my text files, and the search works and I get 3 results, sorted in one way or another. Now I want the sentences (or another string long enough to give context) inside each of the documents, in which the word "Manhattan" appears. And the number of times it appears. Is that possible in Lucene?

Thanks,
Eddy
Lester Burnham
Rancher

Joined: Oct 14, 2008
Posts: 1337
That's the realm of the Lucene Highlighter (also in the "contrib" folder). It can extract context around a search hit and mark it up in some way. To do that it needs the full text that has been indexed, though, which isn't generally stored in the index (for example, in your code only the "path" field is stored, not the "contents" field, which is probably the right thing to do).

Alternatively, the search code could use the information that *is* stored in the index -the path- to retrieve the document, extract its text, and then run a Highlighter over it to get at the context of the search hits.
eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
Cool, I got it.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

I second the recommendation for the book; before it existed a lot of stuff was trial-and-error through the API. This works, and it's *very* educational, but it's also inefficient. Some of the newer stuff is on the complex side, too, and the book goes a long way towards making it digestible.
eddy johns
Ranch Hand

Joined: Feb 16, 2010
Posts: 67
There's a good conversation about highlighter at http://hrycan.com/2009/10/25/lucene-highlighter-howto/, with an excellent code sample that just works. Look at the last comment for the solution on how to compile it under Lucene 3.

Cheers everyone,
Eddy
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Lucene beginner question