This week's book giveaway is in the General Computing forum. We're giving away four copies of Arduino in Action and have Martin Evans, Joshua Noble, and Jordan Hochenbaum on-line! See this thread for details.
I do have data to search and use Solr for searching. Trying to understand real usage of Hadoop. If I understand it correct, Hadoop isn't a replacement of Solr/Lucene right?
I know of one startup that is looking at migrating to HBase (Hadoop version of Google's BigTable) instead of Oracle. So I'm sure in an avg enterprise there are plenty of DB instances where it might be useful to ride on top of the Hadoop infrastructure for more than just search.
Search engines is about retrieval. Hadoop with their MapReduce algorithm framework is about data processing.
Every search engine has a data processing requirement until the data is indexed etc.
Really big search engines needs really big data processing frameworks. Hadoop is the one.
But the category of data processing doesn not reduce to search index processing, but there are plenty of problem domains which can be covered. For example DNA alignment in bioinformatics, various other bioinformatics subjects, all becaues of large genome datasets. There are also graph processing problems where the amount of data is huge and cannot be loaded all of them in memory. There are decision systems, EM (Expectation Maximization) algorithms and other AI subjects, especially those which requires strong mathematical background, such as data-mining. In other words I would say that the category of problems which you cannot solve efficiently in J2EE model, or in simple database applications.
In fact, the last chapter of my book has a whole case study on how IBM uses Hadoop to implement its intranet search.
Long story short, Hadoop can be helpful in enterprise search when you need to implement search in a distributed system. And the main reasons for needing a distributed system in search are scale and complexity. When you're indexing lots of data (IBM's intranet is quite huge), using Lucene/Solr on a single machine would be too slow. Similarly, if you need to do any complex indexing, such as natural language processing, you will easily outgrow the capability of a single machine.
To add to what Tibi Kiss has said above, one can use Hadoop to store large data in it and use MapReduce framework to index the data using Lucene. You can then make the resultane Lucene index documents searchable using Solr.