• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

A Question on Lucene

 
Ranch Hand
Posts: 10198
3
Mac PPC Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Guys,

I have a question on using Lucene to search and serve HTML contents for my web app. One general question that I have is how to read the HTML documents and index it's content so that they are searchable? Are there any good references other than the demo app that comes along with the Lucene download?
 
Joe San
Ranch Hand
Posts: 10198
3
Mac PPC Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Does Apache Solr and Lucene complement each other? What is the difference between these two?
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Reading through the websites of both Solr and Lucene, they don't sound similar. If this is for the project you mentioned elsewhere, then Lucene is almost certainly the proper choice.

With respect to HTML, I think Lucene comes with an example you should be able to adapt. If you're serious about it then you really should work through "Lucene in Action "; it'll save you much time and effort.
 
Joe San
Ranch Hand
Posts: 10198
3
Mac PPC Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, I'm planning to give my community project that I'm working on Lucene powered search capabilities to actually search for articles. I'm using the Lucene demo and building on top of that. But there are certain things that I would like to customize and certain things that I need to understand. Lucene in Action looks promising. Will give it a try.
 
Joe San
Ranch Hand
Posts: 10198
3
Mac PPC Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, Lucene in Action says that Solr is a crawler.
 
Joe San
Ranch Hand
Posts: 10198
3
Mac PPC Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Gold hold of Tika for content extraction and it really made my life easier.
 
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If Joe's still around:
Correct my understanding - you used Apache Tika for converting Html files into indexable text format and used that indexes to be searched using Lucene?
 
Greenhorn
Posts: 10
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not Joe, but that's starting on the right track.


 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Hoefer!
 
R Hoefer
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since I'm thinking of it, have you heard of Luke? http://code.google.com/p/luke/

Nice tool to manage lucene databases in a GUI. I wish someone had told me about it when I started messing with Lucene.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic