Working directly with Java APIs can be tedious and error prone. It also restricts usage of Hadoop to Java programmers. Hadoop offers two solutions for making Hadoop programming easier.
Pig is a programming language that simplifies the common tasks of working with Hadoop: loading data, expressing transformations on the data, and storing the final results. Pig's built-in operations can make sense of semi-structured data, such as log files, and the language is extensible using Java to add support for custom data types and transformations.
Hive enables Hadoop to operate as a data warehouse. It superimposes structure on data in HDFS and then permits queries over the data using a familiar SQL-like syntax. As with Pig, Hive's core capabilities are extensible.
Choosing between Hive and Pig can be confusing. Hive is more suitable for data warehousing tasks, with predominantly static structure and the need for frequent analysis. Hive's closeness to SQL makes it an ideal point of integration between Hadoop and other business intelligence tools.
Hadoop deals with analysis of bigdata ...Since the tool is built in java knowing any object oriented programming is an added advantage.But besides that one should be confident with the concepts of web analytics,data analysis and datawarehousing,distributed computing.
Sachin rakesh wrote:Hadoop deals with analysis of bigdata ...Since the tool is built in java knowing any object oriented programming is an added advantage.But besides that one should be confident with the concepts of web analytics,data analysis and datawarehousing,distributed computing.
Can you recommend some books for all these subjects ? Is there any hadoop book that covers all these topics ?
Sachin rakesh wrote:Hadoop: The Definitive Guide by Tom White...This book is targetted for freshers in hadoop.Try this one..Also contact the experts in hadoop by posting it in hadoop forum etc...
Right now, all I know is the Java part of hadoop. So, how much time would it take (approximately) to learn and become proficient enough in hadoop to do entry-level "company projects" ?
i want to learn hadoop. But i dont know any object oriented language such as java or .net. So is this neccesary to learn one of these first. I'm planning to learn .net first. Is that right?. For learning datawarehousing, what should i do.
Personally I would say that you don't need to know an OOP language to start coding in Hadoop. In fact, this is great!
Because big data crunching is about processing massive amount of data streams, filtering, pipe-ing and aggregating. Functional programming languages are the perfect fit for this. In functional programming languages you deal with data structures, lazy evaluation and functions.
When using an OOP language while doing big data, you will get the same feeling you got when trying to fill the gap between a database and objects.
Thank you Hussein bhaghdadi for your kind reply. I also want to know that for learning 'web analytics, data analysis n datawarehousing n distributed computing' that are necessary for hadoop. So what should i do to learn these things. Is sql server 2008 include any of these thing. Or i've to do oracle dba. Or something else. I dont know.
He was expelled for perverse baking experiments. This tiny ad is a model student:
free, earth-friendly heat - a kickstarter for putting coin in your pocket while saving the earth