I know of one startup that is looking at migrating to HBase (Hadoop version of Google's BigTable) instead of Oracle. So I'm sure in an avg enterprise there are plenty of DB instances where it might be useful to ride on top of the Hadoop infrastructure for more than just search.
Search engines is about retrieval. Hadoop with their MapReduce algorithm framework is about data processing.
Every search engine has a data processing requirement until the data is indexed etc.
Really big search engines needs really big data processing frameworks. Hadoop is the one.
But the category of data processing doesn not reduce to search index processing, but there are plenty of problem domains which can be covered. For example DNA alignment in bioinformatics, various other bioinformatics subjects, all becaues of large genome datasets. There are also graph processing problems where the amount of data is huge and cannot be loaded all of them in memory. There are decision systems, EM (Expectation Maximization) algorithms and other AI subjects, especially those which requires strong mathematical background, such as data-mining. In other words I would say that the category of problems which you cannot solve efficiently in J2EE model, or in simple database applications.
In fact, the last chapter of my book has a whole case study on how IBM uses Hadoop to implement its intranet search.
Long story short, Hadoop can be helpful in enterprise search when you need to implement search in a distributed system. And the main reasons for needing a distributed system in search are scale and complexity. When you're indexing lots of data (IBM's intranet is quite huge), using Lucene/Solr on a single machine would be too slow. Similarly, if you need to do any complex indexing, such as natural language processing, you will easily outgrow the capability of a single machine.
To add to what Tibi Kiss has said above, one can use Hadoop to store large data in it and use MapReduce framework to index the data using Lucene. You can then make the resultane Lucene index documents searchable using Solr.
I wanted to let you know about an upcoming webinar about optimizing search in NoSQL database applications:
Rich Search with NoSQL: Why now?
As developers are rapidly moving to NoSQL for its speed and flexibility, search often becomes the new bottleneck. In this webinar we will cover various topics to optimize text search in NoSQL applications. Included will be a live installation/configuration of the SRCH2 search engine in a MongoDB application. Attend the webinar by signing up at: http://srch2.com/webinar.html
You had your fun. Now it's time to go to jail. Thanks for your help tiny ad.
free, earth-friendly heat - a kickstarter for putting coin in your pocket while saving the earth