I have a question on using Lucene to search and serve HTML contents for my web app. One general question that I have is how to read the HTML documents and index it's content so that they are searchable? Are there any good references other than the demo app that comes along with the Lucene download?
Reading through the websites of both Solr and Lucene, they don't sound similar. If this is for the project you mentioned elsewhere, then Lucene is almost certainly the proper choice.
With respect to HTML, I think Lucene comes with an example you should be able to adapt. If you're serious about it then you really should work through "Lucene in Action "; it'll save you much time and effort.
Yes, I'm planning to give my community project that I'm working on Lucene powered search capabilities to actually search for articles. I'm using the Lucene demo and building on top of that. But there are certain things that I would like to customize and certain things that I need to understand. Lucene in Action looks promising. Will give it a try.