Win a copy of Kotlin Cookbook this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Knute Snortum
  • Henry Wong
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Joe Ess
  • salvin francis

Is Hadoop just for Vulcans?

Posts: 2407
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Has anybody ever managed to install a current version of the Hadoop stack reliably on Linux Mint or Ubuntu for evaluation purposes? And is there a packaged version (Hadoop/HDFS, Pig, Hive, plus maybe HBase, Hue and Oozie) that a normal human being can install to play around with?

  • The Cloudera Quickstart VM is obsolete so it won't play well with other components, and the need to use a VM limits our options for adding other tools/data around it in our current dev environments.
  • Apache BigTop looked promising, but only works with Oracle JDK1.6.
  • The Cloudera Manager - which is supposed to ease the pain of installing CDH - only works on a few selected Linux distributions and seems to fall over a lot and skips some vital (but obscure) SSH configuration.

  • I'm only asking because a couple of us developers have just spent several days trying various combinations of Linux versions, Hadoop versions, Apache/Cloudera alternatives (plus a lot of rough language), but we still didn't manage to get things working right. The biggest problem seems to be version conflicts, and the more variations we try, the further we seem to get from a coherent and reliable combination. Also, tutorials and sample code are all based on different version combinations, which just adds to the fun and games.

    For example, looking at the component versions that still seem to be used, the Apache stack alone has:

  • 4 x Hadoop: 0.23.x, 1.2.x, 2.2.x, 2.3.x
  • 2 x MapReduce variants (not on all platforms but what the hell)
  • 3 x Pig: 0.10.0, 0.11.0, 0.12.0
  • 3 x Hive: 0.10.0, 0.11.0, 0.12.0
  • 4 x HBase: 0.94.17, 0.94.18, 0.96.2, 0.98.0

  • That's up to 288 combinations, before you even start looking at other tools, supported/unsupported Linux distros, JDKs etc. And then there are several versions of Cloudera Hadoop, which may or may not work alongside your Apache components, so if you start trying to mix these up, you are in a whole new world of pain.

    So how does anybody ever get this stuff working on a single PC? I can install a full "enterprise" application stack on my local PC - including fancy commercial BI bloatware - in an hour or two, so why is it so darned difficult to get this one - admittedly powerful - layer installed, when Hadoop has been around for 10 years and is widely used commercially? Is the Hadoop industry run by Vulcans, or what?

    Live long and prosper.
    roses are red, violets are blue. Some poems rhyme and some are a tiny ad:
    Java file APIs (DOC, XLS, PDF, and many more)
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!