Win a copy of Five Lines of Code this week in the OO, Patterns, UML and Refactoring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Hadoop newbie having knowledge of Java & Linux

Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Friends, Greetings,

I am new to Hadoop but have in depth knowledge of Java and Linux. Considering learning whole Hadoop as a mammoth task, I would like to understand which areas of Hadoop i should concentrate on where i can use my existing knowledge of Java and Linux. Which areas are mandatory and i have to learn ?

Looking forward to your advise.

Many Thanks,
Posts: 2407
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't even try to learn the "whole of Hadoop" - these days Hadoop is really a huge collection of open-source projects and you can't learn them all. In fact, I would say don't even try to install the individual packages, because there are lots of mutual incompatibilities and installing Hadoop is just a world of pain. Instead, go for a bundled Hadoop distribution which you can download for free from a provider such as Cloudera or Hortonworks. These companies offer a bundle of popular Hadoop-based tools, pre-installed and configured, which you can download as a "sandbox" VM and run in VirtualBox or VMWarePlayer.

If you go for Cloudera, then you might like to try Udacity's online course Intro to Hadoop and MapReduce which allows free access to the course materials so you can work through it on your own. I think this course uses a VM based on the free Cloudera Express bundle.

Alternatively, download the Hortonworks Sandbox which is another free Hadoop bundle in a VM, but also includes lots of introductory tutorials to help you get started with Hadoop.

Work through the basic tutorials e.g. using core tools like HDFS, Hue, Hive, Pig. Then when you understand a bit about Hadoop, look at application coding e.g. using Java. But bear in mind that writing pure Java MapReduce programs is no longer the preferred approach to coding for Hadoop. There are lots of higher-level libraries, such as Cascading, or tools such as Cloudera's Impala SQL engine, which are designed to make it easier to code your business logic at a more abstract level instead of having to break everything down into MapReduce steps which are hard to write and often do not perform particularly well on larger processes.

And if you want to go beyond MapReduce and see the current state of the art, have a look at Apache Spark with Python/Scala/Java, which is a high performance distributed computing engine that runs stand-alone e.g. on your local PC or a cluster, or on top of Hadoop's YARN engine.
Don't get me started about those stupid light bulbs.
    Bookmark Topic Watch Topic
  • New Topic