This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes Precision and Sanity Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Precision and Sanity" Watch "Precision and Sanity" New topic
Author

Precision and Sanity

Lynn Oliver
Greenhorn

Joined: Jun 10, 2011
Posts: 5
I need a sanity check. I'm writing an application in python and comparing the results to an existing app in java. Generally I'm getting good agreement between the programs (average difference in results is about 0.05%), but in an effort to try and understand the differences I went to a higher precision data type in python. As a result, in a boundary case I end up with the exact correct result in python, but the java app is off by almost 30%. Is this reasonable?

This is a root-sum square computation: the program takes 1024 signed integers, converts each integer to float, computes the sum of the squares, takes the square root of the result and inverts it. That value is used to scale all of the original data, which is then converted back to signed integers. The data is in .wav file PCM format and is interpreted as being between -1 and 1, so there is a scale factor applied when converting to and from floating point. The java program is using the double data type (64 bit floating point format).

The boundary case is simply setting all of the original integers equal to the smallest positive value, 0x00000001.

Here is the key fragment of code from the java app:


Thanks for your help!
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19654
    
  18

Check out #20 of our Beginner's FAQ. It's a problem all programming languages have. In Java, the usual solution is to use BigDecimal instead of double. Unfortunately, there is no sqrt method for BigDecimal so you'd either have to create one yourself, or use double as intermediate (thereby losing precision...).


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Lynn Oliver wrote:
This is a root-sum square computation: the program takes 1024 signed integers, converts each integer to float, computes the sum of the squares, takes the square root of the result and inverts it. That value is used to scale all of the original data, which is then converted back to signed integers.


Since I don't have the full context in which you are performing this calculation I could be wrong but I'm not sure this makes sense. I would expect one to scale using the RMS (root mean square) and not the RSS.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Luigi Plinge
Ranch Hand

Joined: Jan 06, 2011
Posts: 441

There will always be some imprecision when working with floating point numbers (due to the fact that there are an infinite number of them, and only 2^64 possible values of a double). The imprecision is tiny, of the order of 2^-64, but you need to be careful when comparing with integers or each other using ==. For example, the expression 2 - 1.1 == 0.9 is false since the lhs evaluates to 0.8999999..., even though 2 - 1.2 == 0.8 happens to be true. Because integers are always rounded down, something like 1.99999999 will be converted to 1, and (int)((2 - 1.1) * 100) == 89. Hence converting from floating point numbers to integers in general isn't a very good idea (especially to be avoided when dealing with money, where you should do all your arithmetic in pennies using the long datatype).

In your case, what is your expected result and why do you think the results you're getting are wrong? If I'm scaling a minimum value, in general I'd expect something like the minimum value back, however much I'm scaling it by. If it's out by 1, I wouldn't be worried because we're dealing with minimum values, so the actual difference is insignificant. If you're talking about a sound recording, think about the inaccuracy for minimum values that would have been introduced when it was encoded originally: you're trying to be accurate about some data that wasn't accurate to start with. So using BigDecimals to try to increase accuracy would be pointless.

Python might be using something like a BigDecimal automatically, but it will be, in comparison with Java, very slow.

To put it another way, obviously you'd expect some inaccuracy to enter when you convert from a double to an integer. If you have a value of 2.6, what integer are you expecting? Whether it's 3 or 2, you're going to be "inaccurate" by a large percentage. But this "inaccuracy" will only be a large percentage when dealing with values close to the minimum (i.e. 1).

Incidentally, if you want to round to the nearest integer, rather than rounding down, all you have to do is to add 0.5 before converting to an integer. (Or subtract 0.5 for negative numbers.) This would probably be more appropriate for a sound recording than always rounding down.
Luigi Plinge
Ranch Hand

Joined: Jan 06, 2011
Posts: 441

In fact, I think the whole thing I said in my first paragraph above, and what you'll find in the FAQ, is a red herring. It's more likely just due to the way you're rounding (up or down).
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3599
    
  14

You may also want to consider making your variable RSS (note that this variable, as well as Factor, does not conform to the naming conventions) a long instead of a double. You're losing information every time you add a value to it. Not that it will make a big difference, but using a double shouldn't be necessary (Unless of course your integer values are very large, then a long might overflow. If this is the case however, a double will be very lossy as well).

Instead of multiplying each integer with the inverse of the RSS, you can also divide each integer by the RSS.

In general, when working with floating points, avoid unnecessary operations to avoid loss of information.
Lynn Oliver
Greenhorn

Joined: Jun 10, 2011
Posts: 5
Let me walk through the computation step by step:

The java app came up with 0x02D413CD or 47453133. With the test data set equal to 1, the exact result is the value of factor, or 0x40000000.

Unfortunately, I do not have the ability to modify the java app as I do not have the source for it. So, no way to improve the algorithm.

The exercise was to determine if the python version is at least as accurate as the java app. The observed differences using real data (average difference 0.05%, max difference 0.10%) were small enough to be insignificant in terms of the audio data, but I wanted to get a feel for where they are coming from. Using the decimal library in python with 90 significant digits is massive overkill, but I plan to use those results as a baseline for comparison with both the java app and the python app.

I know that people are more familiar with RMS computations, but in this application root sum square is correct.
Luigi Plinge
Ranch Hand

Joined: Jan 06, 2011
Posts: 441

>>> z = int(round(x))
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

You have posted just fragments of Java and fragments of Python but nothing you have posted allows us to help you. My simple tests comparing Python and Java do not indicate anything like the differences that you are reporting but of course I don't have access to your test code. An SSCCE (equivalent Java and Python versions) would allow us provide more than general comments.
 
wood burning stoves
 
subject: Precision and Sanity
 
Similar Threads
whats the difference between float & double
What's strictfp for?
Wrapper Classes
why can a long assigned to float without casting
Implicit casting Q