This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Lucene - get TermVector positions

 
Allasso Travesser
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

re. Lucene:

I am indexing text using Field.TermVector.WITH_POSITIONS in order to use it for highlighting and other post processing. I have not been able to find out how to access this information during the search.

Can anyone give me a pointer?

Thanks,

Allasso
 
Karthik Shiraly
Bartender
Pie
Posts: 1058
24
Android C++ Java Linux PHP Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
IndexReader.getTermFreqVector() methods
 
Allasso Travesser
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I have tried that, however, I get a compile time error:

non-static method getTermFreqVector(int,java.lang.String) cannot be referenced from a static context.

being that IndexReader is an abstract class, I can't instantiate it either.

Is there something I am missing here?

thanks for your reply, Allasso
 
Karthik Shiraly
Bartender
Pie
Posts: 1058
24
Android C++ Java Linux PHP Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Allasso,

A concrete IndexReader object is constructed the normal way, i.e, using its factory method:

Note: Ensure that your writers have been close()d, before getting a reader. Lucene has a kind of versioning concept for the index; to use the latest version, all the writers should be closed.

Perphaps IndexReader's termPositions() method may also prove useful to you.

Cheers
Karthik
 
Allasso Travesser
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you X 100, Karthik. I was turning grey over that one.

Since IndexReader is an abstract class, I assumed it could not be instantiated. The Sun Java tutorial I and I section reads:

"An abstract class is a class that is declared abstract—it may or may not include abstract methods. Abstract classes cannot be instantiated, but they can be subclassed."

I guess I need to read up on factory methods. Is "reader" in your example considered an object, or is it considered something else? Is using the open() method a way of subclassing IndexReader?

thanks again,

Allasso
 
Karthik Shiraly
Bartender
Pie
Posts: 1058
24
Android C++ Java Linux PHP Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Allosso,

'IndexReader.open()' internally creates an object(=instance) of a concrete (i.e., non-abstract) subclass of IndexReader, using new operator.
'reader' is a local variable which is a reference to that object.
'open()' is not subclassing IndexReader; it's just a method to read a Lucene index, and one of its steps is to create an object of a concrete subclass of IndexReader. Such methods which internally choose a particular subclass to instantiate, and return a reference to that instance, are called 'factory methods' - it's a design pattern (DesignPatternFaq)

Cheers
Karthik
 
Allasso Travesser
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thank you, Karthik,

sometimes just a few words in the right direction can get one on his way to some productive learning and save a lot of head banging.

I appreciate your thoughtfulness.

Allasso
 
Allasso Travesser
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Examples work really well for me, so I like to post the successful fruits for the benefit of others in the future...

This will print both the term positions (eg, the nth word in the original indexed content) and the beginning and ending character offsets of the queried term.

NOTE: This example only works for a single query term, otherwise you need to iterate over the query terms as noted in the "display terms" section below.

 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic