• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

assigning large int values to float

 
Ranch Hand
Posts: 250
Android Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Output-
true
false

While assigning i to f, some information will be lost as value of type float are not precise to nine significant digits.


If above statement is correct, line2 printing false is fine. But why does line1 prints true?
And why does the below code prints true even when Integer.MAX_VALUE>123456789?

Output-
true
 
Ranch Hand
Posts: 125
Scala Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Astha Sharma wrote:
Output-
true
false

While assigning i to f, some information will be lost as value of type float are not precise to nine significant digits.


If above statement is correct, line2 printing false is fine. But why does line1 prints true?
And why does the below code prints true even when Integer.MAX_VALUE>123456789?

Output-
true



This can be because of the margin of error in float and double values, which does not return the exact value.
Since float and int is of 32 bits, for the second case, all the bits are used for the maximum value of int, hence for
float f=i;
float doesn't have to - add any precision bit values/or deal with the precision error. So, the down casted value in j will be same as the value in i, hence j==i is true.

While in the first case, since all the 32 bits are not utilized for the value in int i, conversion to float(also 32bit), i.e float f=i, will add some precision bits/or deal with the precision error, when value of "i" will be stored in f. So during down casting the casted value will not be same as the value in "i".

Though I'm not sure, this appears to be the correct reason. Waiting for more responses.
 
Astha Sharma
Ranch Hand
Posts: 250
Android Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sidharth Khattri wrote:

This can be because of the margin of error in float and double values, which does not return the exact value.
Since float and int is of 32 bits, for the second case, all the bits are used for the maximum value of int, hence for
float f=i;
float doesn't have to - add any precision bit values/or deal with the precision error. So, the down casted value in j will be same as the value in i, hence j==i is true.

While in the first case, since all the 32 bits are not utilized for the value in int i, conversion to float(also 32bit), i.e float f=i, will add some precision bits/or deal with the precision error, when value of "i" will be stored in f. So during down casting the casted value will not be same as the value in "i".

Though I'm not sure, this appears to be the correct reason. Waiting for more responses.


If this is the reason, using double in the second code should print "true false", since double has 64 bits.

But this code prints
true
true
 
Sidharth Khattri
Ranch Hand
Posts: 125
Scala Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Astha Sharma wrote:

Sidharth Khattri wrote:

This can be because of the margin of error in float and double values, which does not return the exact value.
Since float and int is of 32 bits, for the second case, all the bits are used for the maximum value of int, hence for
float f=i;
float doesn't have to - add any precision bit values/or deal with the precision error. So, the down casted value in j will be same as the value in i, hence j==i is true.

While in the first case, since all the 32 bits are not utilized for the value in int i, conversion to float(also 32bit), i.e float f=i, will add some precision bits/or deal with the precision error, when value of "i" will be stored in f. So during down casting the casted value will not be same as the value in "i".

Though I'm not sure, this appears to be the correct reason. Waiting for more responses.


If this is the reason, using double in the second code should print "true false", since double has 64 bits.

But this code prints
true
true



Then, there could be some other reason. Anyone with an answer?
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sidharth Khattri wrote:

Astha Sharma wrote:
If this is the reason, using double in the second code should print "true false", since double has 64 bits.

But this code prints
true
true



Then, there could be some other reason. Anyone with an answer?




No. You are correct. It is Astha counter argument that is flawed. The double variable may be 64 bits, but it is holding a 32 bit number. So, as long as no precision was lost during the assignment from int to double, it should have no issues being assigned back from double to int.

As for the assignment from int to double. A double precision floating point number uses 52 bits for the mantissa and 11 bits for the exponent (I think), so it should have no issues holding a 32 bit whole number without any loss of precision.

Henry
 
Ranch Hand
Posts: 59
6
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You have:

The first line has an implicit conversion so both sides are the same type, so it's more like:

Both sides have a precision loss, but by the same amount.

For the second line, look at the value of (int)(float)123456789.

123456789 is 111010110111100110100010101 in binary.
A float will store 24 significant figures, so it'll be rounded to 111010110111100110100011000 which is 123456792 so not equal to 123456789.


Here Integer.MAX_VALUE is rounded to 10000000000000000000000000000000 which is larger than Integer.MAX_VALUE. Any value larger than Integer.MAX_VALUE will become Integer.MAX_VALUE when converted to an integer.

There are other numbers larger than 123456789 that can be represented exactly as floats because they are round numbers in binary. For example:
 
Astha Sharma
Ranch Hand
Posts: 250
Android Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Now I got it. Thanks for the explanation Sresh
 
How do they get the deer to cross at the signs? Or to read this tiny ad?
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic