Forums Register Login

Lucene - get TermVector positions

+Pie Number of slices to send: Send
Hello,

re. Lucene:

I am indexing text using Field.TermVector.WITH_POSITIONS in order to use it for highlighting and other post processing. I have not been able to find out how to access this information during the search.

Can anyone give me a pointer?

Thanks,

Allasso
+Pie Number of slices to send: Send
IndexReader.getTermFreqVector() methods
+Pie Number of slices to send: Send
Yes, I have tried that, however, I get a compile time error:

non-static method getTermFreqVector(int,java.lang.String) cannot be referenced from a static context.

being that IndexReader is an abstract class, I can't instantiate it either.

Is there something I am missing here?

thanks for your reply, Allasso
+Pie Number of slices to send: Send
Hi Allasso,

A concrete IndexReader object is constructed the normal way, i.e, using its factory method:

Note: Ensure that your writers have been close()d, before getting a reader. Lucene has a kind of versioning concept for the index; to use the latest version, all the writers should be closed.

Perphaps IndexReader's termPositions() method may also prove useful to you.

Cheers
Karthik
+Pie Number of slices to send: Send
Thank you X 100, Karthik. I was turning grey over that one.

Since IndexReader is an abstract class, I assumed it could not be instantiated. The Sun Java tutorial I and I section reads:

"An abstract class is a class that is declared abstract—it may or may not include abstract methods. Abstract classes cannot be instantiated, but they can be subclassed."

I guess I need to read up on factory methods. Is "reader" in your example considered an object, or is it considered something else? Is using the open() method a way of subclassing IndexReader?

thanks again,

Allasso
+Pie Number of slices to send: Send
Hi Allosso,

'IndexReader.open()' internally creates an object(=instance) of a concrete (i.e., non-abstract) subclass of IndexReader, using new operator.
'reader' is a local variable which is a reference to that object.
'open()' is not subclassing IndexReader; it's just a method to read a Lucene index, and one of its steps is to create an object of a concrete subclass of IndexReader. Such methods which internally choose a particular subclass to instantiate, and return a reference to that instance, are called 'factory methods' - it's a design pattern (DesignPatternFaq)

Cheers
Karthik
+Pie Number of slices to send: Send
thank you, Karthik,

sometimes just a few words in the right direction can get one on his way to some productive learning and save a lot of head banging.

I appreciate your thoughtfulness.

Allasso
+Pie Number of slices to send: Send
Examples work really well for me, so I like to post the successful fruits for the benefit of others in the future...

This will print both the term positions (eg, the nth word in the original indexed content) and the beginning and ending character offsets of the queried term.

NOTE: This example only works for a single query term, otherwise you need to iterate over the query terms as noted in the "display terms" section below.

Humans and their filthy friendship brings nothing but trouble. My only solace is this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 5082 times.
Similar Threads
Lucene
Lucene 2 + PDF
Lucene
Need Suggestion for Local Search Engine
hibernate search vs lucene
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 08:40:26.