programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Jeanne Boyarsky
• Ron McLeod
• Paul Clapham
• Liutauras Vilda
Sheriffs:
• paul wheaton
• Rob Spoor
• Devaka Cooray
Saloon Keepers:
• Stephan van Hulst
• Tim Holloway
• Carey Brown
• Frits Walraven
• Tim Moores
Bartenders:
• Mikalai Zaikin

# Average algorithm

Ranch Hand
Posts: 209
1
• Number of slices to send:
Optional 'thank-you' note:
How to implement an algorithm that averages an array of N doubles, and is relatively free from bug? It should take care of overflows, and be reasonably fast.

Simply accumulating and then divide is susceptible to overflow while dividing all values by N prior to summing them takes 2x the required operations to calculate.
[ October 22, 2007: Message edited by: Chu Tan ]

Wanderer
Posts: 18671
• Number of slices to send:
Optional 'thank-you' note:
With doubles? Do you realize how big the numbers have to be to get numeric overflow with doubles? I have a hard time imagining a problem where it's worth worrying about. If you had an array of ints or longs, that's one thing - those might actually overflow. But doubles really won't, not in any remotely real-world kind of problem.

I suppose if you really really need to handle this possibility, you could first do the calculation the fast way, and check if the result is Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY. If not, no overflow has occurred, and you're done. If so, then you could repeat the calculation in a slower mode. Perhaps you could divide the array into 10 (roughly) equal parts. Take the sum of each part separately, then divide each by ten, and add the results. So if your array has 1000 elements, you'll split it into ten 100-element arrays, take the sum of each, divide each by 10, and add the results. That way you only divide by ten ten times. Check for overflow (infinity) with each sum, and if it occurs, repeat the sum by subdividing the array into smaller subdivisions, e.g. one hundred 10-element arrays (dividing each sum by 100).

In this way, you only do a large number of divisions in the extremely rare circumstance of near-overflow. Which will never happen anyway in real life - but if you force it to happen as a test condition, the calculation can still proceed. It's just slow.

But really, I'd just ignore this possibility for doubles, and for floats. For an array of ints, I'd take the sum using a long. And for an array of longs, I'd probably just take the sum using a double. Overflow would be impossible in those situations.

Justin Chu
Ranch Hand
Posts: 209
1
• Number of slices to send:
Optional 'thank-you' note:
Thanks, I figure that it wouldn't be an issue 99% of the time for finding mean.

How about calculating variance? It requires sum of squares that can overflow if trying to find the variance of 1000000 data points?

I've found wikipedia's http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance . Are there any open source packages that will solve such problem?

Ranch Hand
Posts: 234
• Number of slices to send:
Optional 'thank-you' note:
There is the Apache Commons Math project, which has this type of calculations. You could take a peek at how they implement things.

 Did you see how Paul cut 87% off of his electric heat bill with 82 watts of micro heaters?