Has anybody ever managed to install a current version of the Hadoop stack reliably on Linux Mint or Ubuntu for evaluation purposes? And is there a packaged version (Hadoop/HDFS, Pig, Hive, plus maybe HBase, Hue and Oozie) that a normal human being can install to play around with?
The Cloudera Quickstart VM is obsolete so it won't play well with other components, and the need to use a VM limits our options for adding other tools/data around it in our current dev environments.
Apache BigTop looked promising, but only works with Oracle JDK1.6.
The Cloudera Manager - which is supposed to ease the pain of installing CDH - only works on a few selected Linux distributions and seems to fall over a lot and skips some vital (but obscure) SSH configuration.
I'm only asking because a couple of us developers have just spent several days trying various combinations of Linux versions, Hadoop versions, Apache/Cloudera alternatives (plus a lot of rough language), but we still didn't manage to get things working right. The biggest problem seems to be version conflicts, and the more variations we try, the further we seem to get from a coherent and reliable combination. Also, tutorials and sample code are all based on different version combinations, which just adds to the fun and games.
For example, looking at the component versions that still seem to be used, the Apache stack alone has:
4 x Hadoop: 0.23.x, 1.2.x, 2.2.x, 2.3.x
2 x MapReduce variants (not on all platforms but what the hell)
3 x Pig: 0.10.0, 0.11.0, 0.12.0
3 x Hive: 0.10.0, 0.11.0, 0.12.0
4 x HBase: 0.94.17, 0.94.18, 0.96.2, 0.98.0
That's up to 288 combinations, before you even start looking at other tools, supported/unsupported Linux distros, JDKs etc. And then there are several versions of Cloudera Hadoop, which may or may not work alongside your Apache components, so if you start trying to mix these up, you are in a whole new world of pain.
So how does anybody ever get this stuff working on a single PC? I can install a full "enterprise" application stack on my local PC - including fancy commercial BI bloatware - in an hour or two, so why is it so darned difficult to get this one - admittedly powerful - layer installed, when Hadoop has been around for 10 years and is widely used commercially? Is the Hadoop industry run by Vulcans, or what?