Is there any way to do iterative computation in Hadoop. In MapReduce paradigm, both map and reduce phase runs only once. Is there any procedure to fire map phase multiple times like iterations of a loop. I believe Apache Giraph runs compute method in some similar logic. Is there any way to perform MapReduce task in similar fashion or is there any alternate programming paradigm that performs similar task.
Thanks in advance !!!
What is the nature of data you are processing, that it requires iterations?
Depending on nature of data, the simplest way may be for the driver to run multiple map reduce jobs in a while loop.
If number of iterations is known and is constant, or can be determined by a first pass job, it's straightforward to run the loop.
If it's not known or is variable depending on data, the reducer in each job is responsible for communicating to the driver whether more iterations are necessary or the terminating condition has been satisfied.
It can do this either via a status file on HDFS, or using Counters.
Thanks for your insight. The nature of the data I'm working with is a set of records, which need to go through multiple phases unless some desired metric is achieved. I previously briefly worked with Giraph and I think it only takes some network as input. Hence there is not much scope of using Giraph (Please correct me if I'm wrong).
Also running a Hadoop job via while loop causes some additional overhead. When a Hadoop job is submitted some set of internal codes run before the map-reduce phase begin. I previously tried such approach but the run-time is too much in this approach.
I mostly worked with Hadoop 1.0. I recently read in some blogs that in Hadoop 2.0 there is provision for iterative programming along with standard map-reduce paradigm. I did not find much help on this topic on internet. Please let me know if you have any knowledge on this.