This week's book giveaway is in the Mac OS forum. We're giving away four copies of a choice of "Take Control of Upgrading to Yosemite" or "Take Control of Automating Your Mac" and have Joe Kissell on-line! See this thread for details.

I was asked to find the 3rd largest number in an unsorted list of numbers. No sorting allowed. I was stumped. How to I think about these kind of problems? Of course, I can search on the net and probably cram the algorithms given. But I want to train my mind to think on such lines of creating generic algorithms. Any pointers? How will I start with this use case? Say I have a list as follows:

Now without sorting this list, I want to find the 3rd largest element. In other words, how to find the nth largest or smallest element in the list?

a) Let's say you have n elements. we will use pivoting technique to find the largest/smallest element.

b) Assume that the (n/2)th element is the pivot. If there are odd no of elements then take [(n+1) /2]th element.

c) Now we divide the rest of the elements(lying to the left and right side of this pivot element) into 2 groups depending whether they are greater or less than this pivot element.

d) We get 2 sets of elements now, say S1 and S2.

e) Here S1 is the set where all elements are <pivot and S2 has elements>pivot. If length of S2 is non-zero, then the largest element lies in S2. We repeat this process with S2. Eventually we will home in on the greatest element when we are left with only 1 element in S2.

Similarly, we can find the smallest element too. Am I correct?

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39044

23

posted

0

That doesn’t sound right. That sounds like a sorting algorithm. And you are still thinking at too high a level. Things like pivot sound like an implementation, not an algorithm. When you get down to saying things like “middle element”, you are more like the right level.

Campbell Ritchie wrote:That doesn’t sound right. That sounds like a sorting algorithm. And you are still thinking at too high a level. Things like pivot sound like an implementation, not an algorithm. When you get down to saying things like “middle element”, you are more like the right level.

Do you mean to say my solution is incorrect or my usage of words to describe it? Or both?

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39044

23

posted

0

You can try your algorithm, which I think is incorrect, but I am not certain.
I also think the words suggest your solving methodology (and I mean methodology not method) is also wrong. You appear (but I am not certain) to be thinking, “how do I program it?” rather than, “what do I do?”

Campbell Ritchie wrote:You can try your algorithm, which I think is incorrect, but I am not certain.
I also think the words suggest your solving methodology (and I mean methodology not method) is also wrong. You appear (but I am not certain) to be thinking, “how do I program it?” rather than, “what do I do?”

Well, this is what came to my mind and I wrote it. I did not take help from the net. Also, when you said that i am still thinking as to "how do I program it" rather than thinking as to "what should I do", how would you have approached this problem? Essentially, please tell me what is the correct approach if mine is wrong. This is the only way I can think of to home in on the greatest/smallest element i.e. by narrowing it down into sets. How else? Just hint at the approach and let me try again.

Mansukhdeep Thind wrote:Do you mean to say my solution is incorrect or my usage of words to describe it? Or both?

Solution. Your description seems quite reasonable.

You should also be able to explain WHY you think it's the best solution for the problem. So I await your explanation...

Oddly enough, you got me thinking about the problem too (your original problem; not the one that Campbell suggested - but he was quite right to do so). I think I've come up with my best, but I'm interested to see how yours ends up.

PS: Mine works in unamortized O(n) time and uses only standard Java classes. If you can come up with better, let me know.

Mansukhdeep Thind wrote:This is the only way I can think of to home in on the greatest/smallest element i.e. by narrowing it down into sets. How else? Just hint at the approach and let me try again.

OK. What has Campbell asked you to do? And what would be your approach if you didn't have this "nth" largest constraint? ie, how would you go about just finding the largest (or smallest) of some unsorted set?

He asked me to find the greatest element of a given unsorted list. You are claiming that my solution won't work. That it is wrong. Why is it wrong? You will eventually find the greatest of n elements by that approach, won't you? What exactly is wrong with my solution?

As Winston said, using pencil and paper to find a solution is the best way to solve the problem. With that in mind, my solution (without sorting) is to consider the problem as a stack of cards with numbers on them face down. I can look at a card and decide to keep it or discard it. So I could keep the first three. After that each card I pick up I would then make the decision to keep the card or discard based on whether one of the cards I am holding is a lower value then the one I am looking at. After looking at each card in the stack once, I am holding only the three highest cards. Then it is simply a matter of selecting the smallest of the three.

Larry Frissell wrote:With that in mind, my solution (without sorting) is to consider the problem as a stack of cards with numbers on them face down.

A very nice simile, Larry. I should've thought of it myself, being a bridge player.

@Mansukhdeep: Think about it Larry's way; and most importantly write down what you do after each card is turned over (and don't forget that a computer doesn't have 'eyes'). And don't write a line of Java code until you understand what he's suggested, and furthermore the model that underpins it. See if you can come up with a couple of your own.

Campbell Ritchie wrote:That doesn’t sound right. That sounds like a sorting algorithm. And you are still thinking at too high a level. Things like pivot sound like an implementation, not an algorithm. When you get down to saying things like “middle element”, you are more like the right level.

Things like pivot sound like an algorithm to me. In fact Mansukhdeep seems to be describing the algorithm that comes to my mind for doing this in O(n) time, as opposed to O(n * log(n)), which a full sort would require. And here I am assuming that k can be proportional to n, so any part of the algorithm that depends on k also depends on n. Many naive solutions of this might be O(n*k) which is O(n^2) as far as I'm concerned.

A lot seems to depend on what is meant by "without doing a sort". If the intent is to prevent the student from re-using standard sorting algorithms, or to prevent the O(n * log(n)) behavior which is inherent to most any good sort, then this approach is fine. If the intent is to prevent any modification of the input data at all, then it's not OK. Of course, this could easily be fixed by copying the input to another array, which is clearly O(n). If there's also a restriction on memory, this might be a problem.

Anyway, Mansukhdeep - it sounds to me like you are on the right track, as I understand the problem.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3016

10

posted

0

Larry Frissell wrote:As Winston said, using pencil and paper to find a solution is the best way to solve the problem. With that in mind, my solution (without sorting) is to consider the problem as a stack of cards with numbers on them face down. I can look at a card and decide to keep it or discard it. So I could keep the first three. After that each card I pick up I would then make the decision to keep the card or discard based on whether one of the cards I am holding is a lower value then the one I am looking at. After looking at each card in the stack once, I am holding only the three highest cards. Then it is simply a matter of selecting the smallest of the three.

This sounds reasonable when we're talking about keeping three. But, imagine we have a list of 1,000,000 records, and we're trying to find the 465,621st largest element. How do we go about retaining the top 465,621 elements? Probably in some sort of sorted list, tree, or heap, right? What sort of performance do we get from this each time we turn over another card and decide if we want to keep it or discard it? And what's the overall performance if we do that n times?

I would submit that the above algorithm will not give O(n) time, but O(n * log(n)) at best (assuming k is proportional to n). There is an algorithm that does it in O(n) time, and it sounds remarkably like what Mansukhdeep has described.

This is what I could understand from Larry's methodology. Let me know if I have grasped it correctly:

a) Lets say I want to pick the 562th largest element out of 2896 elements. So, I pick up 562 cards at random.

b) I will then pick up the 563rd card and compare its value with first 562 cards. Moment I find that this picked up card has a larger value than any one of existing cards in the list, I chuck that out and put this one in. Correct?

c) Repeating step (b) for (n-k) i.e (2896-562) times, I finally have with me the top 562 cards among the initial 2896 cards. Correct?

d) Now, how do I know which is the largest amongst the final 562? These 562 would still be distributed randomly in my opinion. So how do I find the largest? Larry says that it is a matter of selecting the largest of the k. Following on the same lines as described by Larry, I will pick 1 card at random from these top 562 cards. And repeating the steps compare it with rest 561. Finally , I will be left with the largest element in the filtered list of top 562 cards. In other words, I will be left with the 562th greatest card from the original group of 2896 cards.

This is how one would go about picking the kth largest/smallest element from a group of n randomly distributed elements in general. Have I missed out on anything?

Mike Simmons wrote:Things like pivot sound like an algorithm to me.

Yes. Specifically, a sorting algorithm on the lines of Quicksort; which might help you pick out the largest or smallest quickly, but I'm pretty sure it would need to be repeated or involve lots of comparisons between groups to find the nth (or kth) largest.

In fact Mansukhdeep seems to be describing the algorithm that comes to my mind for doing this in O(n) time, as opposed to O(n * log(n)), which a full sort would require. And here I am assuming that k can be proportional to n, so any part of the algorithm that depends on k also depends on n. Many naive solutions of this might be O(n*k) which is O(n^2) as far as I'm concerned.

Well I don't think that's right because
1. As k decreases it would approach O(n).
2. I doubt it ever needs to be more than O(n * n/2) since if k > n/2 you simply flip the algorithm to find the (n-k)th smallest.
However, I'm quite sure Fred will tell me if I'm wrong.

A lot seems to depend on what is meant by "without doing a sort".

I took it to mean exactly that, ie, you cannot perform a sort in the algorithm. I don't see any restriction on using ordered structures though, so I would assume that (as you mentioned) a heap or (my preference) a Skiplist would be perfectly acceptable for holding your 'array of k'.

You're quite right though (and I was completely wrong), my algorithm would be O(n log(k)); and I suspect you could add efficiency by keeping track of the lowest/highest of k too (ie, quick elimination).

If there's also a restriction on memory, this might be a problem.

True, although you could probably do it in place by partitioning the array into subsections of length n and n-k and swapping (although if that was the case I'd really want to do a sort on the k section first ).

Anyway, Mansukhdeep - it sounds to me like you are on the right track, as I understand the problem.

Hmmm. I remain to be convinced, although I quite agree that there are several possible solutions, particularly if the number of distinct values, v, is significantly smaller than n.

Mansukhdeep Thind wrote:b) I will then pick up the 563rd card and compare its value with first 562 cards. Moment I find that this picked up card has a larger value than any one of existing cards in the list, I chuck that out and put this one in. Correct?

Not quite. You specifically chuck out the smallest when you add your card. Basically, your 562 face-up cards act like a "highest values" queue.

Yes. Specifically, a sorting algorithm on the lines of Quicksort; which might help you pick out the largest or smallest quickly, but I'm pretty sure it would need to be repeated or involve lots of comparisons between groups to find the nth (or kth) largest.

The methodology that I am putting out for approval will surely require repeated comparisons right till the size of the subset shrinks to one. That is how I would be programming it. Let me see if I can implement it correctly. That would be the only way to confirm if my methodology is correct.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3016

10

posted

0

Mike Simmons wrote:

Winston Gutkowski wrote:

If there's also a restriction on memory, this might be a problem.

True, although you could probably do it in place by partitioning the array into subsections of length n and n-k and swapping (although if that was the case I'd really want to do a sort on the k section first ).

Partitioning in place was used in the original algorithm Mansukhdeep described. Here I was describing alternatives in case there was a prohibition on changing the input data, which would disallow partitioning in place.

OK, maybe it wasn't explicitly mentioned, and isn't present in the code shown so far. But we both agree it can be added, if modifying the input data is allowed.

Time out guys... I tried my hand at implementing the method that I mentioned above. Here is the code for it:

It is correctly returning the greatest number in an unordered list. Just have a look and point out anything that I may have missed. Or any errors?

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3016

10

posted

0

Well, looks OK so far. What if you are looking for the 2nd greatest? Or the 5th? I think as you go on, you will need to keep track of both the items greater than the pivot, and less than the pivot - and sometimes the element you want will be in the "greater" list, and sometimes in the "less" list.

I don't think it really matters that you pick a pivot from halfway through the list. What you'd like, of course, is a pivot that's halfway through the sorted list - but we have no way of knowing in advance what or where that will be. In fact it's probably advantageous to choose a pivot at random, because otherwise there will be certain inputs that give pathologically bad performance. If you pick randomly, then you will get good pivot points often enough that it will compensate for the occasional bad pivot points. If you don't pick randomly, then by knowing your algorithm, I can devise an input which will give very poor performance. This is the sort of thing that can make a website vulnerable to DOS attack.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3016

10

posted

0

Mike Simmons wrote:

Mike Simmons wrote:

Winston Gutkowski wrote:

If there's also a restriction on memory, this might be a problem.

True, although you could probably do it in place by partitioning the array into subsections of length n and n-k and swapping (although if that was the case I'd really want to do a sort on the k section first ).

Partitioning in place was used in the original algorithm Mansukhdeep described. Here I was describing alternatives in case there was a prohibition on changing the input data, which would disallow partitioning in place.

OK, maybe it wasn't explicitly mentioned, and isn't present in the code shown so far. But we both agree it can be added, if modifying the input data is allowed.

Dammit. I accidentally edited my previous post, when I meant to quote it to update one part. Now I'm missing my previous reply, which responded to many on Winston's other points. I don't suppose anyone else has that cached?

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3016

10

posted

0

Ah well, I'll do it again.

Winston Gutkowski wrote:

Mike Simmons wrote:Things like pivot sound like an algorithm to me.

Yes. Specifically, a sorting algorithm on the lines of Quicksort; which might help you pick out the largest or smallest quickly, but I'm pretty sure it would need to be repeated or involve lots of comparisons between groups to find the nth (or kth) largest.

Yes, but the nice thing is, each time you're operating on a smaller list than the previous list - on average, half as big as the previous one. And since 1/2 + 1/4 + 1/8 + 1/6 + 1/32 ... = 1, the overall performance is nicely bound at O(n).

Winston Gutkowski wrote:
Well I don't think that's right because
1. As k decreases it would approach O(n).

Can we assume that k may be chosen anywhere in the range 0 <= k < n, with roughly equal probability? Because that will definitely lead to O(n * log n) expected time. Sure, if you're in the shallow end of the pool, you can use other techniques - but as n is large, there's a larger and larger proportion of all possible numbers that are not in the shallow part of the pool.

Winston Gutkowski wrote:2. I doubt it ever needs to be more than O(n * n/2) since if k > n/2 you simply flip the algorithm to find the (n-k)th smallest.

Obviously. But O(n * n/2) is clearly O(n^2). You probably meant O(n * log(n/2)), which is O(n * log n). Still not as good as O(n).

Winston Gutkowski wrote:

A lot seems to depend on what is meant by "without doing a sort".

I took it to mean exactly that, ie, you cannot perform a sort in the algorithm. I don't see any restriction on using ordered structures though, so I would assume that (as you mentioned) a heap or (my preference) a Skiplist would be perfectly acceptable for holding your 'array of k'.

Well, Mansukhdeep's algorithm doesn't do a sort either, just repeated partitions. Each of these techniques is something that can be used as part of a sort, but we're not doing a full sort. Because a full sort would necessarily require at least O(n * log n), and there's no need for that.

Winston Gutkowski wrote:You're quite right though (and I was completely wrong), my algorithm would be O(n log(k)); and I suspect you could add efficiency by keeping track of the lowest/highest of k too (ie, quick elimination).

Yeah, we could probably have a threshold t, where if k < t or n-k < t, use Larry and Winston's algorithm, otherwise use the binary search through recursive partitions discussed above.

Mike Simmons wrote:Well, looks OK so far. What if you are looking for the 2nd greatest? Or the 5th? I think as you go on, you will need to keep track of both the items greater than the pivot, and less than the pivot - and sometimes the element you want will be in the "greater" list, and sometimes in the "less" list.

Then I would first apply Larry's methodology and isolate the k greatest/smallest elements in the random list. After that, I would apply my methodology on the filtered list to find the kth smallest / largest element. That should do it. Is there a faster way?

Mike Simmons wrote:I don't think it really matters that you pick a pivot from halfway through the list. What you'd like, of course, is a pivot that's halfway through the sorted list - but we have no way of knowing in advance what or where that will be. In fact it's probably advantageous to choose a pivot at random, because otherwise there will be certain inputs that give pathologically bad performance. If you pick randomly, then you will get good pivot points often enough that it will compensate for the occasional bad pivot points. If you don't pick randomly, then by knowing your algorithm, I can devise an input which will give very poor performance. This is the sort of thing that can make a website vulnerable to DOS attack.

What do you mean when you say "make ... vulnerable to a DOS attack"?

Mansukhdeep Thind wrote:Then I would first apply Larry's methodology and isolate the k greatest/smallest elements in the random list. After that, I would apply my methodology on the filtered list to find the kth smallest / largest element.

Then why not do it that way from the get go?

That should do it. Is there a faster way?

Not that I know of. As Mike says, for finding a specific kth largest element (as opposed to "the k largest"), your approach has expected linear time (but worse case is O(n^2)). Then you can just use that element to extract the k largest with a single final pass over the array, so the overall time will still be O(n).

However, it takes a lot more thought and is much less intuitive than Larry's and my method - so it's much easier to get wrong - and it may well perform worse on small sets and/or small values of k. It's also arguable that a partition is a partial sort, and so strictly against the rules of the problem as stated; but I think that would be splitting hairs.

What do you mean when you say "make ... vulnerable to a DOS attack"?

I think he's extrapolating a bit; but essentially: if you know the way that the pivot is chosen in advance, you can design a "nasty dataset" that causes Quicksort to behave in the worst way possible (O(n^2)), whereas if you choose the pivot at random, it's virtually impossible to design a dataset that intentionally causes bad performance; and choosing at random has very little effect on the overall performance of the algorithm itself.

Skiplists use randomization of pointer stack heights as a matter of course, which is why they are relatively immune to this sort of attack.

However, don't ask me about the Maths, because most of it goes over my head.

Mansukhdeep Thind wrote:Then I would first apply Larry's methodology and isolate the k greatest/smallest elements in the random list. After that, I would apply my methodology on the filtered list to find the kth smallest / largest element.

Then why not do it that way from the get go?

Because I did not know how to clear the first hurdle until a day ago. Now, I will surely implement the complete use case and get back to you guys.

Winston Gutkowski wrote:

Mansukhdeep Thind wrote:That should do it. Is there a faster way?

Not that I know of. As Mike says, for finding a specific kth largest element (as opposed to "the k largest"), your approach has expected linear time (but worse case is O(n^2)). Then you can just use that element to extract the k largest with a single final pass over the array, so the overall time will still be O(n).

However, it takes a lot more thought and is much less intuitive than Larry's and my method - so it's much easier to get wrong - and it may well perform worse on small sets and/or small values of k. It's also arguable that a partition is a partial sort, and so strictly against the rules of the problem as stated; but I think that would be splitting hairs.

Is your method same as Larry's? Or do you have a separate scheme to do this task?

Winston Gutkowski wrote:

Mansukhdeep Thind wrote:What do you mean when you say "make ... vulnerable to a DOS attack"?

I think he's extrapolating a bit; but essentially: if you know the way that the pivot is chosen in advance, you can design a "nasty dataset" that causes Quicksort to behave in the worst way possible (O(n^2)), whereas if you choose the pivot at random, it's virtually impossible to design a dataset that intentionally causes bad performance; and choosing at random has very little effect on the overall performance of the algorithm itself.

Skiplists use randomization of pointer stack heights as a matter of course, which is why they are relatively immune to this sort of attack.

What do you mean by Skiplists? Are they some sort of data structures like ArrayList, LinkedList etc? And what is meant by "randomization of pointer stack heights"?

Winston Gutkowski wrote:However, don't ask me about the Maths, because most of it goes over my head.

Winston

I also need to get a grip on all these Big O notation terminologies that you guys have so nonchalantly been talking about since the beginning of this problem. I want to understand how you guys come to say that "this method will take O(n) time while so and so's method will take O(n log n ) time. But worst case scenario will be O(n^2) time." I read on Wikipedia that this notation is called the Bachhman-Landau form. It is used to represent the growth pattern of a function i.e. how will it behave as size of input is manipulated. How do I determine the time taken by my method to solve a problem? When Mike said that my method will take O(n) time, how did he calculate it?

Putting it simply, the O notation indicates how many iterations it will take to perform e operation. For example with the method above of keeping a sorted list of size k, you will need to have a iteration that goes over every number in your input list, which will take n iterations. For every number, it will insert the number into the sorted list. Assuming you will use something like a binary search to insert the number into the stack, it will take on an average Log k iterations to insert numbers into the sorted stack, so, the total iterations will be n log k

Mansukhdeep Thind wrote:Is your method same as Larry's?

Pretty much. Larry just explained it better with his card anaolgy: Keep a fixed-length priority queue of the k highest items so far and keep pushing in values that are higher than the smallest item in the queue and drop off the smallest.

What do you mean by Skiplists? Are they some sort of data structures like ArrayList, LinkedList etc?

Yup, and in the words of the French knight in Monty Python and the Holy Grail they're "ver' nice". One of the simplest log(n) structures I know of to implement (way simpler than a Tree; especially if it needs to be balanced).

Basically, it's a stack of linked lists built on top of a basic linked list that provides "fast track" access to sections. For more information, you should really read this page or William Pugh's original paper (very straightforward, even for a non-academic like myself).

Unfortunately, there's very little support for them in the Java Collections API, except as "concurrent-use" datasets (one of their other strengths), so you'll either have to find an existing library implementation or roll your own.

And what is meant by "randomization of pointer stack heights"?

It's one of the basic premises that underpins the structure, ie, "fast paths" are generated randomly. Like Quicksort, this makes it far less vulnerable to sets of data that can cause the dataset to perform poorly, while having virtually no impact on performance.

I also need to get a grip on all these Big O notation terminologies that you guys have so nonchalantly been talking about since the beginning of this problem.

My advice: Don't sweat it too much. As you may have already gathered, Mike and I don't even agree about the basics of O(n*k), and we're both pretty experienced. The main thing is that we understand the difference between O(1) (constant time), O(n) (time proportional to the size of a set), O(log(n)) (time proportional to the log of the size of a set (usually base-2)), and "quadratic time" (O(n^2); and usually best avoided unless there's no other way ).

As Jayesh said, it's simply a guide as to how "scalable" an algorithm is, and there are many examples of O(log(n)) algorithms that are actually slower, in practise, than O(n) ones. As as an example: doing a sequential search of ≈20 items or less is likely to be as quick or quicker quicker than a binary search.

Mansukhdeep Thind wrote:Is your method same as Larry's?

Pretty much. Larry just explained it better with his card analogy: Keep a fixed-length priority queue of the k highest items so far and keep pushing in values that are higher than the smallest item in the queue and drop off the smallest.

OK

Winston Gutkowski wrote:

Mansukhdeep Thind wrote:What do you mean by Skiplists? Are they some sort of data structures like ArrayList, LinkedList etc?

Yup, and in the words of the French knight in Monty Python and the Holy Grail they're "ver' nice". One of the simplest log(n) structures I know of to implement (way simpler than a Tree; especially if it needs to be balanced).

Basically, it's a stack of linked lists built on top of a basic linked list that provides "fast track" access to sections. For more information, you should really read this page or William Pugh's original paper (very straightforward, even for a non-academic like myself).

Unfortunately, there's very little support for them in the Java Collections API, except as "concurrent-use" datasets (one of their other strengths), so you'll either have to find an existing library implementation or roll your own.

I shall surely devote some time this weekend understanding SkipList.

Winston Gutkowski wrote:

Mansukhdeep Thind wrote:And what is meant by "randomization of pointer stack heights"?

It's one of the basic premises that underpins the structure, ie, "fast paths" are generated randomly. Like Quick sort, this makes it far less vulnerable to sets of data that can cause the data set to perform poorly, while having virtually no impact on performance.

Hmm..

Winston Gutkowski wrote:

Mansukhdeep Thind wrote:I also need to get a grip on all these Big O notation terminologies that you guys have so nonchalantly been talking about since the beginning of this problem.

My advice: Don't sweat it too much. As you may have already gathered, Mike and I don't even agree about the basics of O(n*k), and we're both pretty experienced. The main thing is that we understand the difference between O(1) (constant time), O(n) (time proportional to the size of a set), O(log(n)) (time proportional to the log of the size of a set (usually base-2)), and "quadratic time" (O(n^2); and usually best avoided unless there's no other way ).

As Jayesh said, it's simply a guide as to how "scalable" an algorithm is, and there are many examples of O(log(n)) algorithms that are actually slower, in practise, than O(n) ones. As as an example: doing a sequential search of ≈20 items or less is likely to be as quick or quicker quicker than a binary search.

HIH

Winston

I would take your advice and not sweat it too much.

Next task is to complete this assignment by finding the kth largest element, which was how it all began in the first place. Will get back with some more code soon guys.

@ Winston : I did as you advised i.e. used a PriorityQueue to get the K largest elements out of N elements. But internally a PriorityQueue is sorting its elements. Correct? I do not want to use that. Can I isolate the K largest elements using some other data structure? Of course, I would be implementing Larry's methodology only. Is there some other way to implement it?

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3016

10

posted

0

Mansukhdeep Thind wrote:Of course, I would be implementing Larry's methodology only. Is there some other way to implement it?

Sure, there's the way you were talking about earlier. Apparently you were only thinking of how to use it to get the greatest element, which was a problem trivially easy to solve with other methods. But I (and Winston) already gave hints how your method could be extended to get the k th element, with better performance if k can be randomly anywhere in the range (not just near the ends). You can get more info under Quickselect if you're interested.

I am stuck guys. Finding the largest element was easy. But implementing the Kth largest is thawing my brain. Let's say for example I have the following unordered set of numbers:

As per Larry's methodology, I pick the first 3 elements:

and we have

Now this is the point at which I am trying to figure out what to do next. How to isolate the top K elements without a sort?

Mansukhdeep Thind wrote:Of course, I would be implementing Larry's methodology only. Is there some other way to implement it?

Sure, there's the way you were talking about earlier. Apparently you were only thinking of how to use it to get the greatest element, which was a problem trivially easy to solve with other methods. But I (and Winston) already gave hints how your method could be extended to get the k th element, with better performance if k can be randomly anywhere in the range (not just near the ends). You can get more info under Quickselect if you're interested.

I could not understand the stuff on the wiki page that you shared.

Larry Frissell
Ranch Hand

Joined: May 16, 2008
Posts: 82

2

posted

0

Now this is the point at which I am trying to figure out what to do next. How to isolate the top K elements without a sort?

What I did was to get the next number, ie 10. Then compared to topKList one at a time.
If topKList[0] was less then "10" I made a tempVariable = topKList[0] replaced the topKList[0] with 10.
Then continued through topKList comparing tempVariable to topKList[i] to ensure that tempVariable (the number that would be discard) was the smallest.

Yes, it would have been easier to sort topKList, but the problem stated was not to use sort.

Mansukhdeep Thind wrote:I am stuck guys. Finding the largest element was easy. But implementing the Kth largest is thawing my brain. Let's say for example I have the following unordered set of numbers:As per Larry's methodology, I pick the first 3 elements:and we have...

Right. You have 3 selected elements: 6, 42 and 8; and the next element: 10.

Now: is 10 larger than the smallest of your 3 elements (6)?
Yes: so you replace that element with 10, leaving you: 10, 42 and 8
(if you want to do it in place with an array, the easiest thing to do is simply swap the '6' and '10' elements around).

Then you take the next value (27) and repeat the process...

When you've finished, your three values will be the three highest in the list.

Winston Gutkowski wrote: Now: is 10 larger than the smallest of your 3 elements (6)?

But Winston, I do not know which is the smallest element in my topKList. These are random numbers. The computer can't see. And we are not allowed to sort. So, how do you know which is the smallest in my assumed topKlist? Do you mean to say that I have to repetitively apply the partition select algorithm to find the smallest in the topKList after each swap until the elements in the remainingList are exhausted?

Mansukhdeep Thind wrote:But Winston, I do not know which is the smallest element in my topKList. These are random numbers. The computer can't see. And we are not allowed to sort. So, how do you know which is the smallest in my assumed topKlist? Do you mean to say that I have to repetitively apply the partition select algorithm to find the smallest in the topKList after each swap until the elements in the remainingList are exhausted?

Yes. That's why the time complexity is O(k * (n-k)). But it's simple - as I said before, way simpler and more intuitive than Quickselect, and so less easy to get wrong. I reckon I could write it in about 20 lines, and make it perfectly understandable without a jot of documentation.

Winston

subject: Algorithm to find the kth largest element in unsorted list