Josh Elser

Greenhorn
+ Follow
since Apr 26, 2013
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Josh Elser

1. Yup, ZooKeeper is essentially as it keeps bootstrapping state for Accumulo and relies heavily on the locking functionality to coordinate distributed events.
2. As stated above, it runs in memory and uses a local filesystem. ZooKeeper is not dependent on Apache Hadoop's DFS. When running more than one ZooKeeper server together, they are redundant without the use of an external distributed filesystem.

You can certainly use a single ZooKeeper server, but it's up to you the level of redundancy and availability you require for your application. ZooKeeper isn't a very heavy service, so if you have separate nodes, it would be good to run 3 servers. You can easily run it along side nodes which are also tasktrackers and/or datanodes. As far as the location of each service, as they as Accumulo can reach the ZooKeepers, namenode, and datanodes over the same network, you should be fine.

Also, you don't need to run a datanode and tasktracker process on every node; however, you'll most often see this, sans a node or two to run the jobtracker and namenode. It heavily depends on the kind of workload you intend to process.

A word to the wise if you do run Accumulo in VMs, keep in mind that Accumulo is very sensitive to time. Virtualization can skew these sorts of things, so just be cognizant of the actual system resources underneath your VM.
10 years ago