This week's book giveaway is in the Jobs Discussion forum. We're giving away four copies of Java Interview Guide and have Anthony DePalma on-line! See this thread for details.

That is a contradiction in terms; randomness implies the possibility of duplicates. This sort of question comes up every now and again; you might find an answer in these old threads: 123. No 1 might be the most useful.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 46437

51

posted

0

Originally posted by myself: . . . No 1 might be the most useful.

The suggestion there is quite similar to Michael Dunn's.

[Campbell]: That is a contradiction in terms; randomness implies the possibility of duplicates.

Not at all. There are many types of random distributions. In this case, it's an even distribution across all numbers in a given range that have not previously been selected. That's fine - the fact that it's not an even distribution across all numbers in the range, period, does not mean it's not random. It just means it doesn't match one particular concept of a random distribution.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 46437

51

posted

0

Disagree; true randomness means the next result is independent of the previous results, so duplicates are possible.

This is of course a recap of disagreements in the previous threads.

Regarding no-duplicates: in that case, what you want is not a series of random numbers; what you want is a subset of the set of integers in a certain range. Therefore algorithms that produce the set and then shuffle it are a fine approach. Another way is to generate random numbers, storing them in a Set, until the Set is large enough; the Set, by its nature, will eliminate duplicates.

Good idea about a Set. But if you put random numbers into a Set, and extract them, which order do you get them back in?Try it for 10 numbers.

What would happen if we used TreeSet instead? . . .

Actually even HashSet doesn't seem to print in insertion order.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

Purely random numbers, with no other stated constraints, would not be influenced by past behavior, true. This point is often heavily emphasized in introductory probability courses, to dislodge certain types of fallacious thinking that students may arrive with. However I see "no duplicates" in this thread not as a contradiction, but as a constraint. We're generating random numbers, subject to the constraint that there can be no duplicates. Certainly there is random behavior involved. And I hope no one would object if we used the term "random shuffle", right? My guess is Ayyappan is not a native English speaker, and yet his initial post seemed clear enough to me, since the "no duplicates" rule was stated up front.

[Campbell]: But if you put random numbers into a Set, and extract them, which order do you get them back in?

You don't need to extract them, or pre-generate the set. You can just add new numbers to the set as you need to generate them. Each time you generate a new random number, test if it's already in the set. If it is, generate a new random number. Keep trying until you generate a random number that's not in the set.

This particular strategy works well initially, and more poorly as more members are added. It's a good choice if the total number of random numbers needed is small compared to the range of possible values. If you eventually need to generate all (or even most) of the possible values, then a shuffle is probably a better choice.

[Campbell]: Actually even HashSet doesn't seem to print in insertion order.

Right - it's not supposed to. A LinkedHashSet would do the job though. Not that you need to rely on insertion order behavior if you use the strategy I just described, but for the fill-the-set-then-remove-items strategy, it could be useful.

Incidentally, it's also possible to modify the shuffle() algorithm so that you don't shuffle the entire list at once. Instead you can just pick out one number at a time, randomly chosen from the remaining numbers, as needed. You still need to allocate an array big enough to store all the values, but as long as that's not more memory than you have available, I believe this strategy would be the fastest one possible, overall. [ October 18, 2008: Message edited by: Mike Simmons ]

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 46437

51

posted

0

Took me a bit of time to see what you meant about not testing the numbers before inserting them, but I can see, yes, that would work nicely, and obviate the need to iterate the set at all. The array might, as you say, work fastest. If the input is random (and java.util.Random is described as pseudo-random), is there any need to shuffle the array? You won't notice a difference, but the array will preserve insertion order.

I altered the little program I posted earlier to try a LinkedHashSet, and only that maintains insertion order with the for-each loop. Obviously a TreeSet would have printed from smallest to largest.

I think we all agree about what the original poster wanted; it's just what you call it.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

[Campbell]: If the input is random (and java.util.Random is described as pseudo-random), is there any need to shuffle the array? You won't notice a difference, but the array will preserve insertion order.

Hm, I didn't really follow this part. I understand that Random is pseudo-random; I haven't been bothering to make that distinction, but it's true. Aside from that though I don't think I understand the scenario you're talking about. Which is odd since it seems to be based off what I said, but I think somewhere along the way we might have gotten different mental pictures.

I would say that it's not necessary to shuffle the entire array, but you need to at least shuffle parts as you go. When someone requests the first element of the array, shuffle just that first element by selecting another element at random (possibly the same element) and swapping the first element with that element. If that other element is also the same element, there's really nothing to do. When someone asks for the second element (index 1), swap with another element of index >= 1. When someone asks for the third element (index 2), swap with another element of index >=2. And so on.

Here's an implementation of what I was thinking of.

[Campbell]: I think we all agree about what the original poster wanted; it's just what you call it.

Agreed.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 46437

51

posted

0

I now see what you mean about the array; presumably its original state is some sort of ordered numbers, so it does need shuffling.

I still think your other suggestion about a List and if it's already in the List discard it is simpler.