Bob Angell

Greenhorn
+ Follow
since Apr 26, 2013
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Bob Angell

Have an existing Hadoop cluster and would like to install Accumulo, Mahout and some other tools from a separate machine and integrate them into this environment. I can probably stand up some Zookeeper VMs if necessary. Also, when I go to install zookeeper by itself (RHEL 6.4 - yum install zookeeper), it pulls in a copy of Hadoop and seems to want this running on the box (even though I already have namenodes/datanodes on another set of boxen). Installing on a single machine is cake, however, trying to integrate pieces/parts seems to be quite an undertaking.

Here is what I have gleaned thus far:

1. Accumulo NEEDS zookeeper?,
2. Zookeeper seems to want to keep data in memory on a znode (does it EVER write it to the HDFS?),
3. and using MapReduce/Hadoop works great in batch mode.
4. Have thought/tried to install Cloudera/Hortonworks in this environment ... Cloudera only supports RHEL 6.2; HW seems to work ok so far

I am thinking of installing (3) Zookeeper VMs and have them point to the Hadoop Cluster, and then have my Accumulo/Mahout VM point to the Zookeeper ensemble. Is this the best way? Will this ultimately use the Hadoop cluster? Do I need to run a base Hadoop service on all of these boxes to make it all communicate?

Any/all help in this matter is greatly appreciated.

Environment: High Performance Computing infrastructure, VMs/Boxes running RHEL 6.4, all using a private network

-Bob-
10 years ago