File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Hadoop and the fly likes Is Hadoop just for Vulcans? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Is Hadoop just for Vulcans?" Watch "Is Hadoop just for Vulcans?" New topic

Is Hadoop just for Vulcans?

chris webster

Joined: Mar 01, 2009
Posts: 2292

Has anybody ever managed to install a current version of the Hadoop stack reliably on Linux Mint or Ubuntu for evaluation purposes? And is there a packaged version (Hadoop/HDFS, Pig, Hive, plus maybe HBase, Hue and Oozie) that a normal human being can install to play around with?

  • The Cloudera Quickstart VM is obsolete so it won't play well with other components, and the need to use a VM limits our options for adding other tools/data around it in our current dev environments.
  • Apache BigTop looked promising, but only works with Oracle JDK1.6.
  • The Cloudera Manager - which is supposed to ease the pain of installing CDH - only works on a few selected Linux distributions and seems to fall over a lot and skips some vital (but obscure) SSH configuration.

  • I'm only asking because a couple of us developers have just spent several days trying various combinations of Linux versions, Hadoop versions, Apache/Cloudera alternatives (plus a lot of rough language), but we still didn't manage to get things working right. The biggest problem seems to be version conflicts, and the more variations we try, the further we seem to get from a coherent and reliable combination. Also, tutorials and sample code are all based on different version combinations, which just adds to the fun and games.

    For example, looking at the component versions that still seem to be used, the Apache stack alone has:

  • 4 x Hadoop: 0.23.x, 1.2.x, 2.2.x, 2.3.x
  • 2 x MapReduce variants (not on all platforms but what the hell)
  • 3 x Pig: 0.10.0, 0.11.0, 0.12.0
  • 3 x Hive: 0.10.0, 0.11.0, 0.12.0
  • 4 x HBase: 0.94.17, 0.94.18, 0.96.2, 0.98.0

  • That's up to 288 combinations, before you even start looking at other tools, supported/unsupported Linux distros, JDKs etc. And then there are several versions of Cloudera Hadoop, which may or may not work alongside your Apache components, so if you start trying to mix these up, you are in a whole new world of pain.

    So how does anybody ever get this stuff working on a single PC? I can install a full "enterprise" application stack on my local PC - including fancy commercial BI bloatware - in an hour or two, so why is it so darned difficult to get this one - admittedly powerful - layer installed, when Hadoop has been around for 10 years and is widely used commercially? Is the Hadoop industry run by Vulcans, or what?

    Live long and prosper.

    No more Blub for me, thank you, Vicar.
    I agree. Here's the link:
    subject: Is Hadoop just for Vulcans?
    It's not a secret anymore!