Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!

# Generating random numbers with given properties

Tim West
Ranch Hand
Posts: 539
Hi all,
Not sure if this is an appropriate forum, but here goes...
I want to write a function that is something like this:
int getRandom (int idealNum, int max, int min)
the idea is that over time, calls to getRandom() should average idealNum, and be in the bounds (min, max). It would be fine to also pass in an extra parameter, if helpful, for example:
int getRandom (int idealNum, int averageSoFar, int max, int min)
The actual distribution of numbers isn't important.
Does anyone have a trick for this? It seems like it should be easy, but the bounds (in particular) make it more difficult.
On a side note, this function will be used for generating test data for load testing a database. I want to say stuff like "give each person a random number of kids between 0 and 5".
Cheers,

--Tim

Stefan Wagner
Ranch Hand
Posts: 1923
Usual randomizers will return numbers between LOW and HIGH and an average of (LOW+HIGH)/2
This is known as equal distributions, in contrast to normal distribution.
For instance the number of children in a culture may be 2.
And the minimum 0.
But the maximum will of course be greater than 4.
To describe a normal distribution, you don't only need the average, but the standard deviation too.
If you throw a dice, you will expect a equal distribution for 1, 2, ... 6.
But if you throw two dices, you get something like a normal distribution for the sum of those dices between 2, 3, ... 12, with an average of 7.
Perhaps this may help in some way.
If you only need few distributions, like age, you could of course try something like this:
(Pseudocode)

A generally bad idea is, to give a averageSoFar.
In real life, if you throw a coin, and get 3 times an 'eagle' - how probably will the next event be 'number'?
If it is a fair coin, it will be 50%.
There is no memory in the coin, telling it what happend before, and the events are independ from each other.
There is no need to correct the past.
If the next 1000 throws lead to 500 eagles and 500 numbers, the sum will be 503 to 500 which is pretty close to the ideal value.
Now we take an integer-coin. One side is '0' and one side is '1'.
For a fair coin we expect an average of '0.5'.
If you throw the coin 1 time, and get a '0'.

Your method will not know how often it will be called in the future, to 'correct' the randomness. If it will be called infinitivly, it can ignore the 'averageSoFar'-value. So we need an additional parameter:

if you now call it with commingCalls=1, shall it return 1?
And if you call it twice, shall it throw a 'AssertedCommingCallsViolatedException'?
Perhaps you will find a OpenSource randomizer- or statistic- library.
I don't know any.
But I'm sure you will not find something with 'avgSoFar'.

Tim West
Ranch Hand
Posts: 539
Stefan,
Thanks for the detailed reply - that was great!
I ended up implementing it in the way you said I shouldn't - with an average so far. You're right about the standard distribution - I know the statistics aren't *too* complicated, but I didn't want to spend too long on this one.
Out of interest, the code I used was this. It basically gives me what I want, though it isn't "Monte Carlo" random.

And some example generated numbers...ideal 2, min 0, max 4:

Thanks again,

--Tim

Jon Egan
Ranch Hand
Posts: 83
Tim,
Are you looking for something where idealNum is not necessarily in the middle of min and max? If not, a simple solution (no need for idealNum) would be:

If idealNum is not the average of min and max, your extra parameter averageSoFar could be used as follows:

so any time your average strayed too high, you'd get a number BELOW idealNum, and any time it went too low, you'd get a number ABOVE idealNum. Over many iterations you end up tending toward idealNum as your average.
Both of these assume that min < idealNum < max. I'd put in a validation of that, maybe throw an InvalidArgumentException if the arguments don't make sense.... except that you said this was just for generating test data...
Thanks for the diversion, hope this helps.
- Jon Egan

Jon Egan
Ranch Hand
Posts: 83
TOOOOOO funny This is what happens when I compose my post, get distracted, get a phone call, do other things, and come back and hit submit. DOH! :roll:
Turns out I had essentially the same algorithm.... my code, for your viewing pleasure:

Tim West
Ranch Hand
Posts: 539
Jon,
Thanks for the input...Yeah I was assuming min <= idealNum <= max...and yup I threw a RuntimeException...though InvalidArgumentException would be better.
I basically did what you did in your latter suggestion but complicated the solution for no good reason.
If avgSoFar < ideal, my 'solution' will allow for the possibility of this number also being < ideal, but it makes it no more than half as likely as the number being greater.
This gives no advantage (since distribution is irrelevant) and your code is considerably simpler - I thought about doing it that way but ruled it out for a reason I've since forgotten. Hmm...
Thanks again,

--Tim

Tim West
Ranch Hand
Posts: 539
Haha and this is what you do when you monitor forums to keenly - I posted my second reply between your first and second replies...
Hmm, sounds like we need exponentially increasing backoff times...I won't post again for a while ;-)

-Tim

Stefan Wagner
Ranch Hand
Posts: 1923
I don't have a solution, but I have new problems
I tested the code 'RandomRange' but modified it, to return ints as intended in the very first post.
Then I counted the occurences of the numbers.
I tested with numbers from 0 to 4 inclusive with an avg. of 2.
The result is well equal distributed - so what is the improvement?
Only the 'randomness' is pretty lost. You don't get two Zeros or Fours in sequence.
Ok - you may specify numbers from 0 to 9 with an avg of 2.
You will get an average of 2, but not randomness.
You get values below 2 in a nearly equal distribution and values above 2 in a nearly equal distribution.

Is this what you want?
I would put a big fat warning to the code, that it isn't really creating randomness. Avoid usage in systems, which really need randomness (like simulating, testing).
If you only look for [012] with an average of 1, you get long sequences of '2 0 2 0' and '1 1 1 '. You will never get '2 2' or '0 0'.
Try to create a random function, which does not know the avgSoFar.
This is the enemy of all randomness!

Brian Pipa
Ranch Hand
Posts: 299
Not to pick nits, but
die=a single, 6-sided cube (often used in gambling and or/gaming)
dice=plural of die
so, you roll the dice. If you only roll one, you roll the die.
http://www.yourdictionary.com/ahd/d/d0211000.html

Brian

Tim West
Ranch Hand
Posts: 539
Stefan,
You're absolutely right on all that. I had a warning, as you say...I beefed it up on your recommendation.
The numbers my code generates are also fairly dodgy. In the situation with range (1, 10), "ideal average" 2, the sequences goes something like
5, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1 etc...
That is, 'n' where n > 2 followed by n 1s.
It still is good enough for what I want - as long as the "average" is right, the distribution and randomness properties are essentially irrelevant. Of course, in 99% of cases this wouldn't be the case, so my code shouldn't be used.
I guess in my original post I was hoping someone else would have an easy solution that wasn't succeptible to these issues...I had a suspicion any solution I wrote would be adequate but only just. The problem just isn't important enough to warrant it.
Still, in general I guess it would be useful to have a library that generates random numbers according to some given properties / distribution (binomial, normal, others?!)... a pet project for some statistician out there maybe
Thanks for all your thoughts - it has been interesting!

--Tim

Stefan Wagner
Ranch Hand
Posts: 1923
Yes. To me too.
Alea jacta sunt. (The diece had been thrown? [Thanks to Papa Brien. you're welcome, till the day I die...] I used 'steak' - an offline translator which did only mention 'dice' and 'cube' for 'Wï¿½rfel' and didn't show '{pl}' as it normally does. But when I use http://dict.leo.org/ I see you're right.)
[ March 31, 2004: Message edited by: Stefan Wagner ]

Gjorgi Var
Ranch Hand
Posts: 85
Actually it's "Alea acta est." But what if you work with just Math.random and if/else statements (or "for" loops) for range of numbers 1-6.

This compiles, but when run, sometimes as a result I get... nothing! How's that? How would you randomly select 1-6 without anything else but this code (strings not allowed tambien)
Muchas gracias

Stefan Wagner
Ranch Hand
Posts: 1923
alea acta est, alea iacta est, alea jacta est?
However - singular form.
alea acta sunt, alea iacta sunt, alea jacta sunt.
Plural form.
I learned latin only from asterix, a french comic.
Since i learned java much deeper I will concentrate on this.
This compiles, but when run, sometimes as a result I get... nothing!

No. It doesn't compile.
Codeblocks don't make much sense, if you don't indent.
Indent with tabs in an editor, save, compile, test, and cut'n'paste.
You miss an opening brace in the very first line, and a semicolon at 'first-=limit'.
Since you only print, if not (first>limit), you often get no output.
And your comment says 'choosing from 1-10'.
It doesn't.
It's choosing from 0-9.
Actually it's you who has to stand up much more early, to tell me about errors.
And: I fear, you missed the whole intention of the thread.
veni, vidi, vici
(edited for typo)
[ April 01, 2004: Message edited by: Stefan Wagner ]

Gjorgi Var
Ranch Hand
Posts: 85
O gustas non disputanum- indentation or not...
About the code that I gave, it is true that I missed some important parts as I didn't use copy/paste operation but re-typed it on the run... I apologize for the confusion... As Stefan pointed out, the problem was in the if-else statement and the number by which Math.random was multiplied... ("6" instead of "10" adding 1 to the product).
However, I think you cannot put a semicolon after closing parentheses like Stefan pointed out:
"and a semicolon at 'first-=limit'."
[ April 01, 2004: Message edited by: Gjorgi Var ]

Stefan Wagner
Ranch Hand
Posts: 1923
I didn't mention 'after parantheses'.

And yes - about randomization, - but:
with special interest in an avg, not necessarly beeing in the center of [min - max].
and not with equal distribution, but normal distribution.

Stefan Wagner
Ranch Hand
Posts: 1923
So I found a solution for a real normaldistribution of whole numbers from 0 to MAX (inclusive).
For numbers [MIN MAX] | MIN != 0 it needs a fix.
And it doesn't work for asymetric distributions.
Here is wonderful output for 23 numbers from 0-22,
by 1 000 000 generated numbers:
Occurences of i = 0:1
Occurences of i = 1:4
Occurences of i = 2:47
Occurences of i = 3:377
Occurences of i = 4:1811
Occurences of i = 5:6295
Occurences of i = 6:18062
Occurences of i = 7:40995
Occurences of i = 8:75955
Occurences of i = 9:118417
Occurences of i = 10:154873
Occurences of i = 11:167258
Occurences of i = 12:154351
Occurences of i = 13:118470
Occurences of i = 14:75863
Occurences of i = 15:41055
Occurences of i = 16:17755
Occurences of i = 17:6283
Occurences of i = 18:1701
Occurences of i = 19:375
Occurences of i = 20:46
Occurences of i = 21:6
Occurences of i = 22:0

Jim Yingst
Wanderer
Sheriff
Posts: 18671
[Stefan]:
Usual randomizers will return numbers between LOW and HIGH and an average of (LOW+HIGH)/2
This is known as equal distributions, in contrast to normal distribution.
For instance the number of children in a culture may be 2.
And the minimum 0.
But the maximum will of course be greater than 4.

Mmmm... this seems a poor example of a normal (Gaussian) distribution, given that it's inherently asymmetric (much like the original problem statement in this thread). A true normal distribution is symmetric about the mean, and extends to infinity in either direction. Also it would cover all real numbers, not just integers. In practice of course we're often dealing with a discrete approximation to a normal distribution, and we needn't usually worry about it extending all the way to positive and negative infinity since the probability becomes negligible (though still nonzero) at some point.
The example of the distribution of the number of children per household might better fit a Poisson distribution. At least, it would if the chance of children remained constant during a couple's lifetime, and there weren't any additional social or economic forces at work. But that's another topic...
[Stefan]: So I found a solution for a real normaldistribution of whole numbers from 0 to MAX (inclusive).
Actually this is a binomial distribution with p = .5 (making it symmetric). For large N (or rather, large MAX) it will approach a normal ditribution.
It's possible to take this same approch, but skew the probabilities so as to skew the mean:

This again is a binomial distribution, but not (necessarily) symmetric as p can be anything from 0 to 1. It should satisfy all Tim's requirements, with a nice bell-shaped distribution (as much as possible, depending how skewed you make it). The time to execute is proportional to (max - min), so for large ranges this may be too slow. In this case you can probably approximate using a normal distribution instead, something like:

[ April 04, 2004: Message edited by: Jim Yingst ]

Stefan Wagner
Ranch Hand
Posts: 1923
Thanks Jim.
Of course it was a big mistake to mix the question of equal- and normal-distribution with symetric and asymetric distribution, as I could have avoided it by more concentration.
My learning of statistics is too long ago for remembering the differences between gaussian and binominal distribution, so thanks for making this a bit more clearly and giving the links.
I will play around with the code a bit and keep it for the case I need it - you don't claim copyright on it?
Can I ask, whether you took it from a repertoire, or developed it just in time?
I thought about time of execution of my code too, and had the idea, for large intervals only to generate one random-number, and iterate over it's bits, deciding to increment or decrement my value.
But that would lead to less readable code and get a bit complex for widths > 64.
Thanks again for your valuable post.

Jim Yingst
Wanderer
Sheriff
Posts: 18671
Can I ask, whether you took it from a repertoire, or developed it just in time?
I just wrote it. For the randomGaussianInt() method, I got the formula sd = n * p * (1 - p) from the page on binomial distributions which I cited; I didn't remember what the formula was. I didn't test either method though, so you might not want to trust them too much yet.

Stefan Wagner
Ranch Hand
Posts: 1923
Yes. I stopped trusting them too much yesterday, when I included them in my testprogram and got funny results for 10,000 runs.
Since the thread-opener stopped writing in this thread, and since I'm only interested in this thread by - well - interest, it's not needful to make it work, but a hint in this thread might help the millions of silent readers, googeling for 'random numbers' or reading this forum every day.

Tim West
Ranch Hand
Posts: 539
Unfortunately the project I'm working on has just reached the (very) pointy end and so I haven't played with the code that's been posted ... yet! The code I wrote very early on ended up sufficing; time constraints prevented me from improving it.
Still, the discussion has been useful and interesting to me, and hopefully others out there.
Cheers,

--Tim

Jim Yingst
Wanderer
Sheriff
Posts: 18671
Mmmm, I see I wrote "n = max - mean" when I meant "n = max - min", in both methods. Also there's no reason the mean needs to be an int. Here are corrected versions:

My results for nextGaussianInt() still seem a little off, maybe, but they're much better now. The nextBinomialInt() method is pretty solid now though.

Stefan Wagner
Ranch Hand
Posts: 1923
Yes - a little bit off.
Mean 10, min 0, max 20, 10000 executions:

I get very different Results, when only creating one java.util.Random - Object, and reusing it, calling 'nextGaussian' (There is a hint about 'nextGaussian' being called every second time in a different way. Each is generating two gaussians, and the second value is returned by the next call. Creating new Objects all the time gives you every time a 'first'-value. This might have leaded to the suspicious results above. The new values look much better, if you ignore the '0' and '20' values:

I thought this may be an effect of:

and changed it to:

calling the method recursively, and the output looks ok for '20:' and not that worse for '0:', but still wrong for '0:', since bigger than the value for '1' (tested for calls = 1000000).
(I kept everything except p and sd as integers).
Shall I post my testprogram?
[ April 08, 2004: Message edited by: Stefan Wagner ]