# float variable value

pradeepta chopra

Ranch Hand

Posts: 137

posted 7 years ago

- 0

well float values are not represented correctly in binary number system. So if you have a value with a big precision value, you can get unusual values.

SCJP 6 | SCWCD 5 | Javaranch SCJP FAQ | SCWCD Links

Campbell Ritchie

Sheriff

Posts: 47286

52

pradeepta chopra

Ranch Hand

Posts: 137

shiva shankar

Ranch Hand

Posts: 85

Campbell Ritchie

Sheriff

Posts: 47286

52

posted 7 years ago

- 0

Sorry for not replying earlier; have been busy.

You will have to find a book about computer hardware to find how the IEEE754 numbers are stored. They are quite complicated to explain, but once you have got the hang of it quite easy to understand.

There is 1 bit which determines sign (1 = -, 0 = +).

There are 8 bits which determine a (binary) exponent) from 0000_0001 = 1 to 1111_1110 = 254, then reduced by the "bias" which is 127 (0111_1111), so the range of exponents comes to -126 to (+)127.

If you have an exponent of 0000_0000 that will still be taken as -126 and the processor will take as many bits as possible from the "fractional" part of the number to make up a number; this is called a denormalised number.

If you have an exponent of 1111_1111 and all the remainder of the number 0 that is taken to mean infinity; an exponent of 1111_1111 and any 1 anywhere to the right of it is "NaN."

The bit about precision only applies in the "normal range" of the numbers; go to the API documentation for Double or Float and you find a field called "MIN_NORMAL" and clicking "constant field values" allows you to see its value. The normal range is from that value up to the largest value you can store; smaller values than MIN_NORMAL are "denormalised" and have reduced precision.

The remaining 23 bits are a "fractional part" and when the processor uses that, it always puts "1." before it, so as to produce a result >= 1 and < 2(binary=10). So you now have your numbers in 24 bits' precision. 24 bits' precision in binary is (according to the Wikipedia link given earlier) 7.225 decimal digits (approx). [Looking back at a previous post, that 7.225 would appear to be correct. ] You can get that by multiplying 24 by the log of 2 (0.3010...).

Now, the number you are expecting had 8 decimal digits in. Count them: 4.1234565. If you can only achieve 7� digits at best, the 8th digit is bound to be imprecise. And you can't be confident about the 7th digit.

The moral of the story is: if you use floating-point arithmetic you will get imprecise results. Never use floating-point arithmetic for money, or counters in "for" loops, because you will suffer nasty errors every now and again.

[ September 05, 2008: Message edited by: Campbell Ritchie ]

You will have to find a book about computer hardware to find how the IEEE754 numbers are stored. They are quite complicated to explain, but once you have got the hang of it quite easy to understand.

There is 1 bit which determines sign (1 = -, 0 = +).

There are 8 bits which determine a (binary) exponent) from 0000_0001 = 1 to 1111_1110 = 254, then reduced by the "bias" which is 127 (0111_1111), so the range of exponents comes to -126 to (+)127.

If you have an exponent of 0000_0000 that will still be taken as -126 and the processor will take as many bits as possible from the "fractional" part of the number to make up a number; this is called a denormalised number.

If you have an exponent of 1111_1111 and all the remainder of the number 0 that is taken to mean infinity; an exponent of 1111_1111 and any 1 anywhere to the right of it is "NaN."

The bit about precision only applies in the "normal range" of the numbers; go to the API documentation for Double or Float and you find a field called "MIN_NORMAL" and clicking "constant field values" allows you to see its value. The normal range is from that value up to the largest value you can store; smaller values than MIN_NORMAL are "denormalised" and have reduced precision.

The remaining 23 bits are a "fractional part" and when the processor uses that, it always puts "1." before it, so as to produce a result >= 1 and < 2(binary=10). So you now have your numbers in 24 bits' precision. 24 bits' precision in binary is (according to the Wikipedia link given earlier) 7.225 decimal digits (approx). [Looking back at a previous post, that 7.225 would appear to be correct. ] You can get that by multiplying 24 by the log of 2 (0.3010...).

Now, the number you are expecting had 8 decimal digits in. Count them: 4.1234565. If you can only achieve 7� digits at best, the 8th digit is bound to be imprecise. And you can't be confident about the 7th digit.

The moral of the story is: if you use floating-point arithmetic you will get imprecise results. Never use floating-point arithmetic for money, or counters in "for" loops, because you will suffer nasty errors every now and again.

[ September 05, 2008: Message edited by: Campbell Ritchie ]

pradeepta chopra

Ranch Hand

Posts: 137

pradeepta chopra

Ranch Hand

Posts: 137

posted 7 years ago

- 0

well jokes apart whatever i have understood is like this

8 bits are used to represent the exponent of the fractional number of which the most significant bit represents the sign that is why we have got the range of -126 to 127.

24 bits to represent the fractional part, and that makes 7.225 digits of precision. and due to this 7.225 digits our last digit becomes imprecise while rounding off.

and the MoRal is never to use the floats in accurate measurements

thanks

8 bits are used to represent the exponent of the fractional number of which the most significant bit represents the sign that is why we have got the range of -126 to 127.

24 bits to represent the fractional part, and that makes 7.225 digits of precision. and due to this 7.225 digits our last digit becomes imprecise while rounding off.

and the MoRal is never to use the floats in accurate measurements

thanks

Campbell Ritchie

Sheriff

Posts: 47286

52

posted 7 years ago

Don't use doubles for precise measurements either. If you require precision (eg for money) either use ints and denominate the amount in pence/cents, or use BigDecimal.

- 0

You mean precise, not accurate.Originally posted by pradeepta chopra:

. . . never to use the floats in accurate measurements

thanks

Don't use doubles for precise measurements either. If you require precision (eg for money) either use ints and denominate the amount in pence/cents, or use BigDecimal.

I agree. Here's the link: http://aspose.com/file-tools |