This week's book giveaways are in the Jython/Python and Object-Oriented programming forums.
We're giving away four copies each of Machine Learning for Business: Using Amazon SageMaker and Jupyter and Object Design Style Guide and have the authors on-line!
See this thread and this one for details.
Win a copy of Machine Learning for Business: Using Amazon SageMaker and JupyterE this week in the Jython/Python forum
or Object Design Style Guide in the Object-Oriented programming forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Knute Snortum
Sheriffs:
  • Liutauras Vilda
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Joe Ess
  • salvin francis
  • fred rosenberger

Languages used in Hadoop Implementation and Real World Problems

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Friends,

I am an academician and studying Hadoop for my presentations in class. As I am new to Hadoop, I seek your expert opinions on following two aspects:
1. Which languages are popularly used by industries to implement Hadoop? Python or Java.
2. Any sources, from where I can get real business scenarios / examples in Industry where Hadoop is being used? Also where the data sets are available.

During my course presentation, I need this information to convince my students. Kindly help me.

Thanking you in anticipation.

Regards,
Meenal
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For real-world business users. you should probably start with the examples provided by major Hadoop providers like Cloudera or Hortonworks:

http://www.cloudera.com/content/cloudera/en/our-customers.html

http://hortonworks.com/customers/

Hadoop itself is implemented mainly in Java as far as I know, and there is a fairly low-level Java API which a lot of people have used for Hadoop programming. However, it is often easier to use higher-level APIs such as Cascading (for Java) or alternative languages like Pig, Hive SQL and others. Pig is a Hadoop-based scripting language, and scripts are converted internally into a series of MapReduce tasks. Hive is a way to manage your data in HDFS as if it were held in relational database tables, and you can use SQL to manipulate your data, which is much easier than trying to do this in Java/MapReduce. As with Pig, the SQL is converted into MapReduce tasks underneath. Hadoop is also the foundation for other tools such as the NoSQL database HBase.

However, Hadoop v.2+ now provides the YARN resource manager, which allows you to plug in alternative processing engines e.g. Tez or Spark instead of the older MapReduce engine. Using these engines can speed up your Hive SQL or Pig jobs significantly. Apache Spark is a distributed processing engine that can run independently or on top of Hadoop's YARN engine. Spark has APIs for Scala, Python and Java, and provides a powerful high-level coding paradigm that many people are starting to see as an alternative to traditional Java/MapReduce with/without Hadoop. One of the nice things about Spark is that you can code your whole data-processing pipeline using the same language/API and a consistent programming model, instead of having to switch between e.g. Java, Pig and Hive SQL to complete different stages in the processing.

Many other languages and tools (e.g. ETL tools, BI, etc) provide interfaces of various kinds to Hadoop, and it seems to be getting easier to use Hadoop as a distributed data store, but use lots of other tools to access/manipulate your data in Hadoop, even if you are not executing your code directly on Hadoop's processing engine.



 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
PS: Welcome to JavaRanch!
 
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
Re:Datasets, you can find some collection here - http://ibmhadoop.challengepost.com/details/data

Regards,
Amit
 
Meenal Abhijit Borkar
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Chris, Amit and Gartner. I will certainly followup with new doubts. Thank you once again.
 
Grow a forest with seedballs and this tiny ad:
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!