This week's book giveaways are in the Jython/Python and Object-Oriented programming forums.
We're giving away four copies each of Machine Learning for Business: Using Amazon SageMaker and Jupyter and Object Design Style Guide and have the authors on-line!
See this thread and this one for details.
Win a copy of Machine Learning for Business: Using Amazon SageMaker and JupyterE this week in the Jython/Python forum
or Object Design Style Guide in the Object-Oriented programming forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Knute Snortum
  • Liutauras Vilda
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Joe Ess
  • salvin francis
  • fred rosenberger

Accumulo, Zookeeper and Hadoop Integration

Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have an existing Hadoop cluster and would like to install Accumulo, Mahout and some other tools from a separate machine and integrate them into this environment. I can probably stand up some Zookeeper VMs if necessary. Also, when I go to install zookeeper by itself (RHEL 6.4 - yum install zookeeper), it pulls in a copy of Hadoop and seems to want this running on the box (even though I already have namenodes/datanodes on another set of boxen). Installing on a single machine is cake, however, trying to integrate pieces/parts seems to be quite an undertaking.

Here is what I have gleaned thus far:

1. Accumulo NEEDS zookeeper?,
2. Zookeeper seems to want to keep data in memory on a znode (does it EVER write it to the HDFS?),
3. and using MapReduce/Hadoop works great in batch mode.
4. Have thought/tried to install Cloudera/Hortonworks in this environment ... Cloudera only supports RHEL 6.2; HW seems to work ok so far

I am thinking of installing (3) Zookeeper VMs and have them point to the Hadoop Cluster, and then have my Accumulo/Mahout VM point to the Zookeeper ensemble. Is this the best way? Will this ultimately use the Hadoop cluster? Do I need to run a base Hadoop service on all of these boxes to make it all communicate?

Any/all help in this matter is greatly appreciated.

Environment: High Performance Computing infrastructure, VMs/Boxes running RHEL 6.4, all using a private network

Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. Yup, ZooKeeper is essentially as it keeps bootstrapping state for Accumulo and relies heavily on the locking functionality to coordinate distributed events.
2. As stated above, it runs in memory and uses a local filesystem. ZooKeeper is not dependent on Apache Hadoop's DFS. When running more than one ZooKeeper server together, they are redundant without the use of an external distributed filesystem.

You can certainly use a single ZooKeeper server, but it's up to you the level of redundancy and availability you require for your application. ZooKeeper isn't a very heavy service, so if you have separate nodes, it would be good to run 3 servers. You can easily run it along side nodes which are also tasktrackers and/or datanodes. As far as the location of each service, as they as Accumulo can reach the ZooKeepers, namenode, and datanodes over the same network, you should be fine.

Also, you don't need to run a datanode and tasktracker process on every node; however, you'll most often see this, sans a node or two to run the jobtracker and namenode. It heavily depends on the kind of workload you intend to process.

A word to the wise if you do run Accumulo in VMs, keep in mind that Accumulo is very sensitive to time. Virtualization can skew these sorts of things, so just be cognizant of the actual system resources underneath your VM.
Thanks tiny ad, for helping me escaped\ the terrible comfort of this chair.
Java file APIs (DOC, XLS, PDF, and many more)
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!