File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes TGIF problem - median of 5 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "TGIF problem - median of 5" Watch "TGIF problem - median of 5" New topic
Author

TGIF problem - median of 5

Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7554
    
  18

In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?

However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:
What is the most efficient way of determining the median of specifically a group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to prove to myself that my solution works empirically.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Mansukhdeep Thind
Ranch Hand

Joined: Jul 27, 2010
Posts: 1157

Winston Gutkowski wrote:In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?


My first question too is why only groups of 5? Why not 3 or 7 or 9 or 11 or whatever? What is the reason for having 5 elements in a group?

Winston Gutkowski wrote:However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:
What is the most efficient way of determining the median of specifically a group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to prove to myself that my solution works empirically.

Winston



As far as I can remember from my school days, a median is the value that lies midway between a list of sorted values. So, each sub group needs to be sorted, its median found and then a continuous group of such medians is to be formed. Then the median of these medians is to be found. This would be the pivot element. And you are attempting to do that in the least possible time. Am I correct? What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later? In other words, what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?


~ Mansukh
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7554
    
  18

Mansukhdeep Thind wrote:And you are attempting to do that in the least possible time. Am I correct?

Yup, and since the base group is 5, it makes sense to start with that problem; and it's not as easy as it looks.

What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later?

Because partition() does a lot of work itself, so it makes sense to make it as effective as possible. If you're unlucky enough to choose your pivots badly, the same element can get swapped many times by Quicksort (hence the worst case time of O(n^2)). The "median of medians, grouped by 5" is also extremely fast, since at each stage of recursion you're reducing by a power of 5 rather than the usual 2.

Mansukhdeep Thind wrote:what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?

Well, according to the Wiki page, it improves worst-case time to O(n*log(n)) - although they also say that, in practice, it generally doesn't make much difference.

I'm also darn sure that 5 wasn't chosen by accident, and that it probably gives the best "bang for the buck" in terms of efficiency; but exactly why I honestly don't know.

Winston
Paul Mrozik
Ranch Hand

Joined: Feb 10, 2013
Posts: 117

Looks like having blocks of 3 would not guarantee O(n):

The reason is that by choosing blocks of 3, we might lose the guarantee of having an O(n) time algorithm.

For blocks of 5, the time complexity is

T(n) = T(n/5) + T(7n/10) + O(n)

For blocks of 3, it comes out to be

T(n) = T(n/3) + T(2n/3) + O(n)

Check this out: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf


Source: http://stackoverflow.com/questions/3908073/optimal-median-of-medians-selection-3-element-blocks-vs-5-element-blocks

What is the most efficient way of determining the median of specifically a group of 5 values?


You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7554
    
  18

Paul Mrozik wrote:You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

Possibly, but I suspect not. Quickselect, which finds the kth element of n, only needs to do a partial sort (the "partition" method we've been talking about), since you already know which partition the result should be in. The problem is that the first partition of 5 requires 4 comparisons and at least 2 swaps. I'm trying to work out if there's a better alternative than Quickselect for specifically 5 items.

Just a fun little problem that I've never really thought about before.

Winston
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: TGIF problem - median of 5
 
Similar Threads
Dream collection of books for this forum?
Project facet Java 5.0 is not supported by target runtime Apache Tomcat v5.5
Study group for the SCJP exam - March-April 2001
Dead buttons
Is Certification really worth it ?