Win a copy of JDBC Workbook this week in the JDBC and Relational Databases forum
or A Day in Code in the A Day in Code forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
  • Ron McLeod
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Frits Walraven
  • Tim Holloway
  • Carey Brown
  • Piet Souris
  • salvin francis
  • fred rosenberger

Is Hadoop just for Vulcans?

Posts: 2407
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Has anybody ever managed to install a current version of the Hadoop stack reliably on Linux Mint or Ubuntu for evaluation purposes? And is there a packaged version (Hadoop/HDFS, Pig, Hive, plus maybe HBase, Hue and Oozie) that a normal human being can install to play around with?

  • The Cloudera Quickstart VM is obsolete so it won't play well with other components, and the need to use a VM limits our options for adding other tools/data around it in our current dev environments.
  • Apache BigTop looked promising, but only works with Oracle JDK1.6.
  • The Cloudera Manager - which is supposed to ease the pain of installing CDH - only works on a few selected Linux distributions and seems to fall over a lot and skips some vital (but obscure) SSH configuration.

  • I'm only asking because a couple of us developers have just spent several days trying various combinations of Linux versions, Hadoop versions, Apache/Cloudera alternatives (plus a lot of rough language), but we still didn't manage to get things working right. The biggest problem seems to be version conflicts, and the more variations we try, the further we seem to get from a coherent and reliable combination. Also, tutorials and sample code are all based on different version combinations, which just adds to the fun and games.

    For example, looking at the component versions that still seem to be used, the Apache stack alone has:

  • 4 x Hadoop: 0.23.x, 1.2.x, 2.2.x, 2.3.x
  • 2 x MapReduce variants (not on all platforms but what the hell)
  • 3 x Pig: 0.10.0, 0.11.0, 0.12.0
  • 3 x Hive: 0.10.0, 0.11.0, 0.12.0
  • 4 x HBase: 0.94.17, 0.94.18, 0.96.2, 0.98.0

  • That's up to 288 combinations, before you even start looking at other tools, supported/unsupported Linux distros, JDKs etc. And then there are several versions of Cloudera Hadoop, which may or may not work alongside your Apache components, so if you start trying to mix these up, you are in a whole new world of pain.

    So how does anybody ever get this stuff working on a single PC? I can install a full "enterprise" application stack on my local PC - including fancy commercial BI bloatware - in an hour or two, so why is it so darned difficult to get this one - admittedly powerful - layer installed, when Hadoop has been around for 10 years and is widely used commercially? Is the Hadoop industry run by Vulcans, or what?

    Live long and prosper.
    Wait for it ... wait .... wait .... NOW! Pafiffle! A perfect tiny ad!
    Devious Experiments for a Truly Passive Greenhouse!
      Bookmark Topic Watch Topic
    • New Topic