aspose file tools*
The moose likes Other Open Source Projects and the fly likes Lucene Hits Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Lucene Hits" Watch "Lucene Hits" New topic
Author

Lucene Hits

Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1871
Hi,
IndexSearcher in Lucene accepts the query and returns the Hits object.As stated in one tutorial,Lucene is IR Library rather than Search Engine.Does implementor need to construct catche/crawler for even faster search/indexing?
Also how the results are returned?As per the tutorial(s) on net,it uses Score for a page(Document in general),how differen is this in comparison with PageRank of Google?To my knowledge ,PageRank calculates the score not only on the frequency of accessing the page but also the backlinks(total pages pointing towards that page)How the score of Document is calculated in Lucene?
Does Hit stand for Hypertext Induced Topic Selection?the algorithm used to rank the document?
Thanks
Arjun
[ January 06, 2005: Message edited by: Arjun Shastry ]

MH
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Arjun Shastry:

IndexSearcher in Lucene accepts the query and returns the Hits object.As stated in one tutorial,Lucene is IR Library rather than Search Engine.Does implementor need to construct catche/crawler for even faster search/indexing?
Also how the results are returned?As per the tutorial(s) on net,it uses Score for a page(Document in general),how differen is this in comparison with PageRank of Google?To my knowledge ,PageRank calculates the score not only on the frequency of accessing the page but also the backlinks(total pages pointing towards that page)How the score of Document is calculated in Lucene?
Does Hit stand for Hypertext Induced Topic Selection?the algorithm used to rank the document?
[ January 06, 2005: Message edited by: Arjun Shastry ]


I call Lucene a "search engine" because its a convenient and recognizable term. Technically it is an API that has no user interface, no crawler, and no parsers. To me, it is the "engine", whereas Google is a search "application". Semantics and word games aside it is not necessary to implement caching around Lucene. The Hits object itself has some built-in caching for most recently accessed (or soon to be accessed) documents.

Hits from Lucene are ordered by score, a sophisticated calculation which puts more relevant documents (to the query) at the top, and less relevant documents below.

Google's PageRank is comparable to how Nutch, a system built around Lucene, ranks its documents. It does lots of Lucene trickery to weight documents in a PageRank-like fashion. Most of us, however, are not building web crawlers where PageRank works decently. In intranet or other domains of use, the built-in Lucene scoring mechanism works amazingly well.

I have never heard that acronym for HIT, and I do not think it applies to Lucene's concept of a Hit. A "hit" is synonymous with "match".


Co-author of Lucene in Action
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Lucene Hits
 
Similar Threads
Lucene
search engine to search both at database and web application level
Partial Matching with Lucene
Lucene beginner question
Special Characters Lucene