Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Hadoop and the fly likes MapReduce Combiner without Shuffle ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "MapReduce Combiner without Shuffle ?" Watch "MapReduce Combiner without Shuffle ?" New topic
Author

MapReduce Combiner without Shuffle ?

andre mantei
Greenhorn

Joined: Sep 12, 2013
Posts: 5
Hi everyone,

I have a question addressing the data-flow in mapreduce-jobs.

DataFlow in a WordCount-example (taken from a book):

The mapper produces key-value-pairs like (car, 1)
maybe a lot of this pairs with the same key (car, 1)

The shuffle-phase produces key-ListOfValues-pairs like (car, 1, 1, 1)

The reducer summarizes the List and produces a key-value-pair like (car, 3)

When I want to use a combiner, I can use the reducer as a combiner (regarding to
the book, I've been learning from).
But how is this possible ? When I want to use the reducer as a combiner, there has to be a
shuffle-phase before the combiner, right ? Without a shuffle-phase, there is no
List of 1's and the combiner could not sum the value for a specific key.

Clearly I am missing something, can someone please explain it to me ?

Greetings, Andre
Rajesh Nagaraju
Ranch Hand

Joined: Nov 27, 2003
Posts: 57
The book would say it is good to use the same code as the reducer and not use the reducer itself.

When a mapper starts outputting data it first stores it in a circular buffer, when the circular buffer
reaches a threshold (configurable) it starts to spill it the disk.

The combiner is not a reducer as it runs on the mapper itself, what a combiner does is it combines the mapper
spills. The combiner is not guaranteed to run, it needs a minium number of spill files (again configurable). It also need
not run once based on the number of spill files the combiner can run multiple time

Hope that helps
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: MapReduce Combiner without Shuffle ?
 
Similar Threads
How to list all visible containers
How to do novice-level RSA
Sorting using Mapreduce
Getting all key-value pairs from a hashmap in a random order
Question for Authors - How does QA fit in