File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Jobs Discussion and the fly likes Big Data Entry Level Job Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Careers » Jobs Discussion
Bookmark "Big Data Entry Level Job" Watch "Big Data Entry Level Job" New topic
Author

Big Data Entry Level Job

Kyle Jones
Ranch Hand

Joined: Jan 22, 2014
Posts: 32
Hi guys,

Just been offered a big data entry level job starting on Monday.

Job spec listed Java, Linux, Spring (Core + Integration), Hadoop, MapReduce & PIG as technologies used.

I am pretty new to a lot of this and looking to hit the ground running as its a 4 month contract with possibilty of extension.

Any ideas what areas I should be focussing on? I have Java experience but the others just a little

K. Tsang
Bartender

Joined: Sep 13, 2007
Posts: 2628
    
    9

4 months? What's that going to give you in terms of learning those technologies? Each of them in itself need a learning curve if not played/read about before.

I reckon the first thing should be Hadoop. Installing and configuring Hadoop is probably the first thing you will need to know.

Also if you are not familiar with big data, I suggest you do some research on this too.

Once you have, you should ask, why Hadoop and not XYZ to do big data?


K. Tsang JavaRanch SCJP5 SCJD/OCM-JD OCPJP7 OCPWCD5 OCPBCD5
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1873
    
  16

Don't try to install Hadoop and its various components (Pig, Hive etc) individually on your own, as it's a real configuration nightmare that will waste days/weeks. Instead, install one of the pre-packaged virtual machines from Hortonworks or Cloudera.

For example, I've been using the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. It's a lot easier than installing all these components by hand, and it's a great resource for learning about Hadoop, even if you expect to use a different Hadoop distribution for your project, as Pig/Hive etc are all pretty much the same across the different Hadoop distributions.
So:

  • Install VMWare Player or Virtualbox.
  • Download and install the Hortonworks Sandbox VM.
  • Work through the Hortonworks tutorials on Hadoop, HDFS, Pig, Hive etc


  • You should be able to learn enough from this to get started in your new job pretty quickly.

    Good luck!


    No more Blub for me, thank you, Vicar.
    Kyle Jones
    Ranch Hand

    Joined: Jan 22, 2014
    Posts: 32
    K. Tsang wrote:4 months? What's that going to give you in terms of learning those technologies? Each of them in itself need a learning curve if not played/read about before.

    I reckon the first thing should be Hadoop. Installing and configuring Hadoop is probably the first thing you will need to know.

    Also if you are not familiar with big data, I suggest you do some research on this too.

    Once you have, you should ask, why Hadoop and not XYZ to do big data?


    Its 4 months and as far as I know it can be extended as long as they are happy with me.

    I am familiar with big data concepts and have been studying how hadoop works, with HDFS & MapReduce, but havent actually used them properly yet.

    I assume training will be provided in this.
    Kyle Jones
    Ranch Hand

    Joined: Jan 22, 2014
    Posts: 32
    chris webster wrote:Don't try to install Hadoop and its various components (Pig, Hive etc) individually on your own, as it's a real configuration nightmare that will waste days/weeks. Instead, install one of the pre-packaged virtual machines from Hortonworks or Cloudera.

    For example, I've been using the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. It's a lot easier than installing all these components by hand, and it's a great resource for learning about Hadoop, even if you expect to use a different Hadoop distribution for your project, as Pig/Hive etc are all pretty much the same across the different Hadoop distributions.
    So:

  • Install VMWare Player or Virtualbox.
  • Download and install the Hortonworks Sandbox VM.
  • Work through the Hortonworks tutorials on Hadoop, HDFS, Pig, Hive etc


  • You should be able to learn enough from this to get started in your new job pretty quickly.

    Good luck!


    Exactly what I was looking for!

    Cheers
    Kyle Jones
    Ranch Hand

    Joined: Jan 22, 2014
    Posts: 32
    Been trying to install Hortonworks Sandbox on my 32bit macbook but the requirements say 64 bit only..

    Anyone know if there is a 32 bit alternative option available??

    I've actually set it up in Virtual Box but when you start it it doesnt fully finish loading so I am assuming this is the 32 bit issue
    chris webster
    Bartender

    Joined: Mar 01, 2009
    Posts: 1873
        
      16

    I think you might be out of luck there, as the download instructions seem to specify a 64-bit host operating system is a requirement. Cloudera's Quickstart also requires a 64-bit host.

    Could you sign up for Amazon Web Services and set yourself up with a 64-bit VM there? Then maybe you can install VirtualBox/VMWare and Hortonworks inside your 64-bit VM at Amazon.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Big Data Entry Level Job