aspose file tools*
The moose likes Hadoop and the fly likes MapReduce Combiner without Shuffle ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "MapReduce Combiner without Shuffle ?" Watch "MapReduce Combiner without Shuffle ?" New topic
Author

MapReduce Combiner without Shuffle ?

andre mantei
Greenhorn

Joined: Sep 12, 2013
Posts: 5
Hi everyone,

I have a question addressing the data-flow in mapreduce-jobs.

DataFlow in a WordCount-example (taken from a book):

The mapper produces key-value-pairs like (car, 1)
maybe a lot of this pairs with the same key (car, 1)

The shuffle-phase produces key-ListOfValues-pairs like (car, 1, 1, 1)

The reducer summarizes the List and produces a key-value-pair like (car, 3)

When I want to use a combiner, I can use the reducer as a combiner (regarding to
the book, I've been learning from).
But how is this possible ? When I want to use the reducer as a combiner, there has to be a
shuffle-phase before the combiner, right ? Without a shuffle-phase, there is no
List of 1's and the combiner could not sum the value for a specific key.

Clearly I am missing something, can someone please explain it to me ?

Greetings, Andre
Rajesh Nagaraju
Ranch Hand

Joined: Nov 27, 2003
Posts: 62
The book would say it is good to use the same code as the reducer and not use the reducer itself.

When a mapper starts outputting data it first stores it in a circular buffer, when the circular buffer
reaches a threshold (configurable) it starts to spill it the disk.

The combiner is not a reducer as it runs on the mapper itself, what a combiner does is it combines the mapper
spills. The combiner is not guaranteed to run, it needs a minium number of spill files (again configurable). It also need
not run once based on the number of spill files the combiner can run multiple time

Hope that helps
 
Gartner says :Bigdata will be most advanced analytics products by 2015 !

Time to Become Big data architect by learning Hadoop(Developer, Administration,Analyst,QA),Cassandra,MongoDb,HBase,Datascience, Mahout, Splunk,R etc) from scratch to expert level

https://intellipaat.com/course-cat/big-data/?utm_source=coderanch%20&utm_medium=text&utm_campaign=coderanchdx1
 
subject: MapReduce Combiner without Shuffle ?