File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Hadoop and the fly likes MapReduce vs Distributed task Queue Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of REST with Spring (video course) this week in the Spring forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "MapReduce vs Distributed task Queue" Watch "MapReduce vs Distributed task Queue" New topic

MapReduce vs Distributed task Queue

Zaharie Sergiu

Joined: Jul 24, 2012
Posts: 1
Hello all,

Can someone make a clear difference between this 2 concepts, when is better to build a distributed system with MapReduce (Hadoop) or Distributed Task Queue (Celery)?
With respect to performance, load balancing, big data, scalability, reliability, availability, efficiency, what can be a drawback or advantage of using one or another?

I am currently in the research phase of a project, which consist in building a web based distributed system. I have an initial text mining software which I need to decompose it in order to integrate it with one of this 2 frameworks and make it distributed.

Thank you!
Mark Spritzler

Joined: Feb 05, 2001
Posts: 17276

OK, here is an answer that isn't really direct.

the answer

It Depends.

It depends on what task, process you are doing. There are some tasks that you want run distributed, but doesn't fit well into MapReduce and some that do. Typical Hadoop example of reading many files and counting words is a great example of Map Reduce. getting results for a search like Google is great example for MapReduce. Handling Events via Messaging and processing the data doesn't need MapReduce and distributed tasks would be better.

It depends on the particular use case.


Perfect World Programming, LLC - iOS Apps
How to Ask Questions the Smart Way FAQ
Consider Paul's rocket mass heater.
subject: MapReduce vs Distributed task Queue
It's not a secret anymore!