• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Is Hadoop just for Vulcans?

 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
<rant>
Has anybody ever managed to install a current version of the Hadoop stack reliably on Linux Mint or Ubuntu for evaluation purposes? And is there a packaged version (Hadoop/HDFS, Pig, Hive, plus maybe HBase, Hue and Oozie) that a normal human being can install to play around with?

  • The Cloudera Quickstart VM is obsolete so it won't play well with other components, and the need to use a VM limits our options for adding other tools/data around it in our current dev environments.
  • Apache BigTop looked promising, but only works with Oracle JDK1.6.
  • The Cloudera Manager - which is supposed to ease the pain of installing CDH - only works on a few selected Linux distributions and seems to fall over a lot and skips some vital (but obscure) SSH configuration.

  • I'm only asking because a couple of us developers have just spent several days trying various combinations of Linux versions, Hadoop versions, Apache/Cloudera alternatives (plus a lot of rough language), but we still didn't manage to get things working right. The biggest problem seems to be version conflicts, and the more variations we try, the further we seem to get from a coherent and reliable combination. Also, tutorials and sample code are all based on different version combinations, which just adds to the fun and games.

    For example, looking at the component versions that still seem to be used, the Apache stack alone has:

  • 4 x Hadoop: 0.23.x, 1.2.x, 2.2.x, 2.3.x
  • 2 x MapReduce variants (not on all platforms but what the hell)
  • 3 x Pig: 0.10.0, 0.11.0, 0.12.0
  • 3 x Hive: 0.10.0, 0.11.0, 0.12.0
  • 4 x HBase: 0.94.17, 0.94.18, 0.96.2, 0.98.0

  • That's up to 288 combinations, before you even start looking at other tools, supported/unsupported Linux distros, JDKs etc. And then there are several versions of Cloudera Hadoop, which may or may not work alongside your Apache components, so if you start trying to mix these up, you are in a whole new world of pain.

    So how does anybody ever get this stuff working on a single PC? I can install a full "enterprise" application stack on my local PC - including fancy commercial BI bloatware - in an hour or two, so why is it so darned difficult to get this one - admittedly powerful - layer installed, when Hadoop has been around for 10 years and is widely used commercially? Is the Hadoop industry run by Vulcans, or what?

    Live long and prosper.
    </rant>
     
    catch it before it slithers away! Oh wait, it's a tiny ad:
    a bit of art, as a gift, that will fit in a stocking
    https://gardener-gift.com
    reply
      Bookmark Topic Watch Topic
    • New Topic