• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Determining Static Percentage Rates

 
Ranch Hand
Posts: 3271
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Okay, so here's a math question which I suspect should be easy, but I can't seem to put my finger on it...

When a specific operation is performed, there are 5 possible outcomes. I belive the outcomes occur as a set percentage (i.e. outcome A happens 50% of the time, outcome B happen 30% of the time, etc.) The only real way that I can determine what those percentages are is to continue to perform the operation and record the results. Based on that, I've started to get a feel for how often each outcome occurs. So far, I've done about 1000 samples and come up with these rough numbers:

A: ~55%
B: ~33%
C: ~12%
D: ~0.004%
E: ~0.001%

The kicker here is that outcomes D and E are *very* rare. In all of the samples I've done, outcome E has occurred only once and D has happened only 4 times.

What I'm wondering is whether or not there's a good way to determine how many samples must be taken to have a statistically sound estimate of what the actual percentages are. Obviously, the more samples I take, the better my estimate will be, but I have no idea what my error rate is. Any thoughts?

Thanks.
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My wife -- who uses some rather advanced statistics in her day-to-day work -- is not sure there is a closed-form analytical solution for this. This may be one of those cases where Monte Carlo (random) simulation is needed to determine the significance.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hmm, I would think an acceptable approximation is

&sigmap = sqrt(p*(1-p)/n)

Unfortunately we don't really know p here - measuring p is the point of the excercise. But it's usually acceptable to use the measured, approximate value of p in lieu of the real thing.
(What else could one do?)

So for case E, the variance of the mean would be sqrt((.00001)*(.99999)/1000) = .0001. That means, roughly, that each time you repeat the experiment 1000 times, you can expect that the measured value of p might vary by 0.01 %. (Bearing in mind that it can vary by more than one standard deviation, but this gets increasingly unlikely the farther out you get.) Since this is ten times as large as the measurement itself (at least, for the current best estimate), you would presumably want to repeat the measurement a few more times.

If you want the variance of the mean to be within, say, 0.0001% (a tenth of the presumed value) then solve for n:

n = p*(1-p)/σ2 = (.00001)(.99999)/(.000001)2 = 9999900

Yes, that's a lot of repetitions, unfortunately.

I may be misremembering some of the details on this, as it's been a long time - but it feels right to me, at least for a simple view of the distrbution. There were probably some simplifying assumptions that went into the formula, which I don't remember right now. Hopefully it's close enough for a rough idea though.
[ December 18, 2006: Message edited by: Jim Yingst ]
 
reply
    Bookmark Topic Watch Topic
  • New Topic