wood burning stoves 2.0*
The moose likes Other Open Source Projects and the fly likes Clustering with Lucene Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Clustering with Lucene" Watch "Clustering with Lucene" New topic
Author

Clustering with Lucene

Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Just something I've wondered... How would one use Lucene in a cluster? Is there anything else to take care of other than setting up a shared disk that all nodes can access?


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Lasse Koskela:
Just something I've wondered... How would one use Lucene in a cluster? Is there anything else to take care of other than setting up a shared disk that all nodes can access?


A shared disk would work as long as you take care to adjust the location of the lock file. Don't use an NFS shared drive though, as many have reported issues using it with Lucene.

There are other architectures to consider as well.... such as distributing the index across multiple machines and creating a facility to query across multiple machines. Lucene has built-in RMI searching capability. Nutch does something different to scale across multiple machines with socket connections, which may be worth investigating also.


Co-author of Lucene in Action
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Originally posted by Erik Hatcher:
There are other architectures to consider as well.... such as distributing the index across multiple machines and creating a facility to query across multiple machines. Lucene has built-in RMI searching capability. Nutch does something different to scale across multiple machines with socket connections, which may be worth investigating also.
Am I correct in interpreting this as Lucene not having that partitioning support out of the box? Do you have a link to somewhere I could quickly read up on the RMI stuff?

I suppose the ultimate solution would indeed be something like Google has, with a farm of cheap servers each responsible for one or two corners of the big cheese, but I'd be more interested in seeing a "suggested" solution for a setup where you've got two high-end Solaris boxes using a single external disk with RAID-WHATEVER, etc. That's what I've been working with mostly during the past couple of years.
Erik Hatcher
Author
Ranch Hand

Joined: Jun 11, 2002
Posts: 111
Originally posted by Lasse Koskela:
Am I correct in interpreting this as Lucene not having that partitioning support out of the box? Do you have a link to somewhere I could quickly read up on the RMI stuff?

I suppose the ultimate solution would indeed be something like Google has, with a farm of cheap servers each responsible for one or two corners of the big cheese, but I'd be more interested in seeing a "suggested" solution for a setup where you've got two high-end Solaris boxes using a single external disk with RAID-WHATEVER, etc. That's what I've been working with mostly during the past couple of years.


Lucene itself does not have partitioning support, no.
Nutch, created by the same person (Doug Cutting) builds the partitioning on top of Lucene, including a distributed file system.

You could easily share a disk between servers, just being sure that the lock file from both servers points to the shared space (both servers could not index at the same time, but both could search at the same time).

As for reading up on the RMI capabilities, check out the RemoteSearchable. The source code to Lucene in Action has an example (see the link in my signature). Simply run "ant SearchServer" when you unzip the source code and you'll be doing searches over RMI!
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Ok. Thanks. I somehow assumed that Nutch didn't have anything to do with Lucene.
Otis Gospodnetic
Author
Greenhorn

Joined: Dec 30, 2004
Posts: 23
Oh, Nutch has a lot in common with Lucene. Not only was Nutch created by the same person who created Lucene, Nutch also uses Lucene for the actual indexing and searching. Also, Lucene in Action includes a Nutch case study in the Case Studies chapter (chapter 10). Check this: http://www.lucenebook.com/search?query=nutch

Otis


Lucene in Action: http://www.manning.com/lucene
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Clustering with Lucene
 
Similar Threads
Lucene in Action
Competitors to Lucene
Lucene
jsp & google search "something site:www.myhomepage.com"
how is the quality of the Lucene ports