aspose file tools*
The moose likes Other Open Source Projects and the fly likes Lucene - get TermVector positions Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Lucene - get TermVector positions" Watch "Lucene - get TermVector positions" New topic
Author

Lucene - get TermVector positions

Allasso Travesser
Ranch Hand

Joined: Feb 06, 2010
Posts: 35
Hello,

re. Lucene:

I am indexing text using Field.TermVector.WITH_POSITIONS in order to use it for highlighting and other post processing. I have not been able to find out how to access this information during the search.

Can anyone give me a pointer?

Thanks,

Allasso
Karthik Shiraly
Ranch Hand

Joined: Apr 04, 2009
Posts: 503
    
    5
IndexReader.getTermFreqVector() methods
Allasso Travesser
Ranch Hand

Joined: Feb 06, 2010
Posts: 35
Yes, I have tried that, however, I get a compile time error:

non-static method getTermFreqVector(int,java.lang.String) cannot be referenced from a static context.

being that IndexReader is an abstract class, I can't instantiate it either.

Is there something I am missing here?

thanks for your reply, Allasso
Karthik Shiraly
Ranch Hand

Joined: Apr 04, 2009
Posts: 503
    
    5
Hi Allasso,

A concrete IndexReader object is constructed the normal way, i.e, using its factory method:

Note: Ensure that your writers have been close()d, before getting a reader. Lucene has a kind of versioning concept for the index; to use the latest version, all the writers should be closed.

Perphaps IndexReader's termPositions() method may also prove useful to you.

Cheers
Karthik
Allasso Travesser
Ranch Hand

Joined: Feb 06, 2010
Posts: 35
Thank you X 100, Karthik. I was turning grey over that one.

Since IndexReader is an abstract class, I assumed it could not be instantiated. The Sun Java tutorial I and I section reads:

"An abstract class is a class that is declared abstract—it may or may not include abstract methods. Abstract classes cannot be instantiated, but they can be subclassed."

I guess I need to read up on factory methods. Is "reader" in your example considered an object, or is it considered something else? Is using the open() method a way of subclassing IndexReader?

thanks again,

Allasso
Karthik Shiraly
Ranch Hand

Joined: Apr 04, 2009
Posts: 503
    
    5
Hi Allosso,

'IndexReader.open()' internally creates an object(=instance) of a concrete (i.e., non-abstract) subclass of IndexReader, using new operator.
'reader' is a local variable which is a reference to that object.
'open()' is not subclassing IndexReader; it's just a method to read a Lucene index, and one of its steps is to create an object of a concrete subclass of IndexReader. Such methods which internally choose a particular subclass to instantiate, and return a reference to that instance, are called 'factory methods' - it's a design pattern (DesignPatternFaq)

Cheers
Karthik
Allasso Travesser
Ranch Hand

Joined: Feb 06, 2010
Posts: 35
thank you, Karthik,

sometimes just a few words in the right direction can get one on his way to some productive learning and save a lot of head banging.

I appreciate your thoughtfulness.

Allasso
Allasso Travesser
Ranch Hand

Joined: Feb 06, 2010
Posts: 35
Examples work really well for me, so I like to post the successful fruits for the benefit of others in the future...

This will print both the term positions (eg, the nth word in the original indexed content) and the beginning and ending character offsets of the queried term.

NOTE: This example only works for a single query term, otherwise you need to iterate over the query terms as noted in the "display terms" section below.

 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Lucene - get TermVector positions