This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I am still very new to Hadoop, so if my question seems lame please pardon me..
basically there's so many flavors of hadoop out there .. MapR, Cloudera, Hortonworks, etc.. how do i know which one is right for me.. i mean the company..
also, if you can elaborate on the differences that will be great.
If you want an Enterprise Grade distribution where a business can rely on, easy to use and manage, No Single Point of Failure, ease of on-boarding data into the cluster using regular UNIX/Linux commands through NFS, The Best Performance, MapR is the way to go.
Besides NFS and No single Point of Failure, you have features such as Volumes, Snapshots and Mirrors which are critical for Multitenancy and Disaster Recovery,
The nice thing is that you can download and try a free version of each of them. The base Apache distro is good when you're just learning and getting started but you wouldn't deploy it in production.
The real advantage of the bundled distributions such as Cloudera and Hortonworks is that you don't have to do the juggling to get version x of Hive working with version y of Hadoop and version z of Hbase. They also come with better tools for deployment and management in an operational environment.
MapR gives that but as the previous poster mentioned also has a number of unique extensions such as its NameNode-free HA architecture and the NFS integration. If you have existing legacy systems that push data to NFS this is a very nice option.
So its really down to the combination of the products in the bundle, the tools you need to run it in production and how much you are willing to pay for the commercial aspects of CDH and MapR in particular. But as I said, try them all!