This week's book giveaways are in the Java EE and JavaScript forums.
We're giving away four copies each of The Java EE 7 Tutorial Volume 1 or Volume 2(winners choice) and jQuery UI in Action and have the authors on-line!
See this thread and this one for details.
The moose likes Portals and Portlets and the fly likes Any Open Source  Search Engine with Auto Crawling? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Portals and Portlets
Bookmark "Any Open Source  Search Engine with Auto Crawling?" Watch "Any Open Source  Search Engine with Auto Crawling?" New topic
Author

Any Open Source Search Engine with Auto Crawling?

asr chowdary
Ranch Hand

Joined: Mar 14, 2013
Posts: 35
Hi

We have Lucene Search Engine(Open Source),but it dosen't have Crawling. Any Other Search Engine with Crawling??


Thanks
M Shareef
Greenhorn

Joined: Jun 10, 2012
Posts: 6

may be this link helpful for you:

http://java-source.net/open-source/crawlers

M Shareef
Greenhorn

Joined: Jun 10, 2012
Posts: 6

You can go to Solr serach engine which is build on Lucene.

http://lucene.apache.org/solr/
Hussein Baghdadi
clojure forum advocate
Bartender

Joined: Nov 08, 2003
Posts: 3479

M Shareef wrote:You can go to Solr serach engine which is build on Lucene.

http://lucene.apache.org/solr/


Lucene isn't a search engine, Lucene is an IR (Information Retrieval) library. Also Solr doesn't provide any crawling facilities.
To have crawling, you might want to use "Apache Nutch".
Luan Cestari
Ranch Hand

Joined: Feb 07, 2010
Posts: 163

I used Nutch some time ago. It is a very nice project and you can easily fit different proposes (I know that some big companies use it). Bixo (openbixo) seems to be very nice (I didn't tested yet). Depending your propose and your time I would say to create your own using some parallel programming (there is a lot of details in this part, like using a bloom filter to store the URL already fetched ) and a database (cassandra, e.g.) to store.


Please, visit me for some cool tech post at www.ourdailycodes.com
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Any Open Source Search Engine with Auto Crawling?