• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Hadoop newbie having knowledge of Java & Linux

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Friends, Greetings,

I am new to Hadoop but have in depth knowledge of Java and Linux. Considering learning whole Hadoop as a mammoth task, I would like to understand which areas of Hadoop i should concentrate on where i can use my existing knowledge of Java and Linux. Which areas are mandatory and i have to learn ?

Looking forward to your advise.

Many Thanks,
MH
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Don't even try to learn the "whole of Hadoop" - these days Hadoop is really a huge collection of open-source projects and you can't learn them all. In fact, I would say don't even try to install the individual packages, because there are lots of mutual incompatibilities and installing Hadoop is just a world of pain. Instead, go for a bundled Hadoop distribution which you can download for free from a provider such as Cloudera or Hortonworks. These companies offer a bundle of popular Hadoop-based tools, pre-installed and configured, which you can download as a "sandbox" VM and run in VirtualBox or VMWarePlayer.

If you go for Cloudera, then you might like to try Udacity's online course Intro to Hadoop and MapReduce which allows free access to the course materials so you can work through it on your own. I think this course uses a VM based on the free Cloudera Express bundle.

Alternatively, download the Hortonworks Sandbox which is another free Hadoop bundle in a VM, but also includes lots of introductory tutorials to help you get started with Hadoop.

Work through the basic tutorials e.g. using core tools like HDFS, Hue, Hive, Pig. Then when you understand a bit about Hadoop, look at application coding e.g. using Java. But bear in mind that writing pure Java MapReduce programs is no longer the preferred approach to coding for Hadoop. There are lots of higher-level libraries, such as Cascading, or tools such as Cloudera's Impala SQL engine, which are designed to make it easier to code your business logic at a more abstract level instead of having to break everything down into MapReduce steps which are hard to write and often do not perform particularly well on larger processes.

And if you want to go beyond MapReduce and see the current state of the art, have a look at Apache Spark with Python/Scala/Java, which is a high performance distributed computing engine that runs stand-alone e.g. on your local PC or a cluster, or on top of Hadoop's YARN engine.
 
It's exactly the same and completely different as this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic