aspose file tools*
The moose likes Distributed Java and the fly likes Sorting using Mapreduce Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Distributed Java
Bookmark "Sorting using Mapreduce" Watch "Sorting using Mapreduce" New topic
Author

Sorting using Mapreduce

Madhumitha Baskaran
Ranch Hand

Joined: Feb 27, 2010
Posts: 66
Hi,

I want to implement distributed sorting using Map reduce in Java. From whatever I know, Merge sort will look better for distributed sorting. Are there are any other algorithms that I could use? Please guide.

Thanks,
Madhu
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
Note that using Mapreduce will involve two separate phases and thus two algorithm choices.

Obviously the phase which collects and merges separate sort jobs will have to be a merge but the separate sort jobs will operate just like any other sort with the same considerations.

Bill
Madhumitha Baskaran
Ranch Hand

Joined: Feb 27, 2010
Posts: 66
Thanks.

I have some questions popping out.

I am not able to find what is the exact key over here. Out of Map function, if I need to find separate keys, it is not possible. Because if I have large amount of integers to be sorted, then key should be the same for everything for me to merge it together later at reduce phase.

Shall I do something like.. I will go through the input data set and picks few keys randomly. I will make all the integers lesser than a key to get associated with that key. I will sort each group of a key using Map function. Later I will just merge everything in the reduce phase depending on the sorting order of keys. Will this work out well?

I dont know how to use Mapreduce efficiently for large data set. Can you help me out.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
I don't understand the function of "keys" in your example - sounds like an extra step for no gain.

I'm not an expert on Mapreduce but if this was my problem I would cut the data set into roughly equal portions, one for each sorting processor/thread/whatever, then merge by looking at the "top" element of each of the returned sorted arrays, take the lowest (or highest), remove it and repeat till all sources are exhausted.

Bill
 
Consider Paul's rocket mass heater.
 
subject: Sorting using Mapreduce