Carlos Morillo

Ranch Hand
+ Follow
since Jun 06, 2009
Carlos likes ...
Scala Python Java
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Carlos Morillo

Real Time in a Hadoop cluster is one of the many things possible when you use MapR Enterprise Grade Distribution compared to Apache Hadoop or other distributions thanks to the many enhancements and features of MapR-FS, which is a read/write file system among others.

See this use case documented here http://www.mapr.com/blog/twitter-feed-fuels-real-time-hadoop-with-storm-and-maprr-at-the-strata-conference.

If you want to accomplish this using other Hadoop distributions at least you would need 2 clusters and a lot of more hardware, therefore way more expensive.


HTH,


Carlos.
10 years ago
You can also watch this video about how to use it.

http://youtu.be/kNsS9aDf6uE


There is also an extensive PDF in the AWS about how to use the EMR including the EMR through the CLI.

HTH,


Carlos
10 years ago
If you want an Enterprise Grade distribution where a business can rely on, easy to use and manage, No Single Point of Failure, ease of on-boarding data into the cluster using regular UNIX/Linux commands through NFS, The Best Performance, MapR is the way to go.

Besides NFS and No single Point of Failure, you have features such as Volumes, Snapshots and Mirrors which are critical for Multitenancy and Disaster Recovery,
10 years ago
I'd say it depends on the use case. Likely that's for the Analytics and BI aspect or consumers of the output of MapReduce Jobs.

You need UNIX/Linux skills to install and manage a Hadoop cluster.

You need Java skills to understand the framework and to write MapReduce jobs but you can also use some other programming languages as well.

You need some SQL skills to play with Hive.

You need to understand RDBMS to understand their limitations and how NoSQL Databases such as HBase (Hadoop Database) solve certain kind of problems.

At the end there has to be some consumer to get insights and make decisions and these are Analytics and BI software such as Datameer, Tableau, etc.


HTH,

Carlos.
10 years ago
Hadoop has been around as an open source project for barely 7 years.

Would you recommend a customer (Let's say an Investment Bank in Wall Street) who needs to run a mission critical application any specific Hadoop distribution?
Why?

Is this Hadoop distribution capable of NameNode HA, JobTracker HA, Volumes, Snapshots, Mirrors and any other features important for disaster recovery?

In my view ease of use, ease to make the Data Ingestion in the Hadoop cluster filesystem are important critical features to have.
10 years ago
Hadoop runs on Linux only.
Your mileage may vary.
Some customers get really good performance on Cisco UCS and also HP DL380 among others.
Hadoop uses the notion of data locality, so the closer the data is to the node where the task is running the better performance you get.
Depending on the application SSDs might have a positive impact versus classic HDD.

MapR Enterprise Grade Distribution for Hadoop has several records for the most popular Hadoop benchmarks.
10 years ago
To the Hadoop authors including Garry
Turkington, Srinath Perera, Thilina Gunarathne, Jonathan R.
Owens, Brian Femiano, Jon Lentz:


Can you please give us a high level idea of the Use Cases described in "Hadoop Real World Solutions Cookbook"?

Thanks,

Carlos.
10 years ago
Hi Amit,


All the Open Source Apache Projects of the Hadoop ecosystem including Cascading work transparently and without any issues on top of MapR.
All the Hadoop APIs and and Apache Projects part of the Hadoop ecosystem are 100% API compatible with MapR.

MapR just makes Hadoop more robust and enterprise-grade and easier to use.

See here for more information.

Best Regards,

Carlos.
11 years ago
Hi Amit,

I would strongly suggest and recommend you take a look at the MapR M5 Test Drive VM.
It already has some samples for HBase, Hive and Pig.
You can get it here.

HTH,

Carlos.
11 years ago
I think my approach was just to throw an Exception if the port it's already been used.

I had a textfield in the GUI to start my RMI server with the default port number giving the option to the user to change it.


HTH,


Carlos.
I think you do really need to think simple.

Typically Server Ports of a well known published service are kind of known and fixed.

If I remember correctly there is a well known default port for the RMI registry.

Why don't you just used that as the default?

You can give the freedom to the person starting the RMI server to specify an alternate port number.

Obviously you will need to handle the exception if the port number is already been used.

HTH,

Carlos.
I would strongly suggest and recommend the FAQs and the Oracle SCJD Certification page.