Can someone make a clear difference between this 2 concepts, when is better to build a distributed system with MapReduce (Hadoop) or Distributed Task Queue (Celery)?
With respect to performance, load balancing, big data, scalability, reliability, availability, efficiency, what can be a drawback or advantage of using one or another?
I am currently in the research phase of a project, which consist in building a web based distributed system. I have an initial text mining software which I need to decompose it in order to integrate it with one of this 2 frameworks and make it distributed.
It depends on what task, process you are doing. There are some tasks that you want run distributed, but doesn't fit well into MapReduce and some that do. Typical Hadoop example of reading many files and counting words is a great example of Map Reduce. getting results for a search like Google is great example for MapReduce. Handling Events via Messaging and processing the data doesn't need MapReduce and distributed tasks would be better.