I want to implement distributed sorting using Map reduce in Java. From whatever I know, Merge sort will look better for distributed sorting. Are there are any other algorithms that I could use? Please guide.
I am not able to find what is the exact key over here. Out of Map function, if I need to find separate keys, it is not possible. Because if I have large amount of integers to be sorted, then key should be the same for everything for me to merge it together later at reduce phase.
Shall I do something like.. I will go through the input data set and picks few keys randomly. I will make all the integers lesser than a key to get associated with that key. I will sort each group of a key using Map function. Later I will just merge everything in the reduce phase depending on the sorting order of keys. Will this work out well?
I dont know how to use Mapreduce efficiently for large data set. Can you help me out.
Author and all-around good cowpoke
Joined: Mar 22, 2000
I don't understand the function of "keys" in your example - sounds like an extra step for no gain.
I'm not an expert on Mapreduce but if this was my problem I would cut the data set into roughly equal portions, one for each sorting processor/thread/whatever, then merge by looking at the "top" element of each of the returned sorted arrays, take the lowest (or highest), remove it and repeat till all sources are exhausted.