I try to understand how an a Byte/Short/Integer/Long is represented by a binary.

Normally 1111 1111 1111 1111 1111 1111 1111 1100 stands for 4294967292, but in Java it's -4. Because of something called two's complement, right?
How can I calculate this? Do I have to do 1111 1111 1111 1111 1111 1111 1111 1100 - binary of 1 and then invert the bits? Can someone explain to me how I can add/subtract bits?

If every number, positive and negative is signed by the leftmost bit and the rest of the bits represent the value using two's complement notation. So why is int i = 4 not the two's complement of 0000 0100? If 0000 0100 is already the 2s complement, then the "normal" value would be 0000 0100 - 0000 0001 = 0000 0011 --> invert it = 1111 1100, but that is 252. I'm totally confused.

How is '4' really represented in Java as binary? 0000 0100 or the 2s complement of this?

In two's complement, the negative of a value is calculated by first inverting all the bits, and then adding 1.

So for a byte value 4 (0000 0100), the negative is the one's complement (1111 1011) and then add 1 (1111 1100): -4.

Subtraction is actually performed by *adding* the two's complement of the subtrahend to the minuend. For instance, let's subtract 5 from 14:

The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.

Ok, now it's clear how I calculate the 1s and 2s complement. But how does Java store the byte 'b' internally:

as 0000 0100 or the 2s complement of this that would be 1111 1100? But 1111 1100 is negative. Or is the 2s complement calculated without the leftmost bit?

The output of the sysout is 100, and that's not the 2s complement.

When we say "Java uses two's complement binary representation", it doesn't mean Java always takes the two's complement of every number. It means positive numbers are stored as usual in the binary system, and negative numbers are stored as the two's complement of their positive counterpart.

4 is always stored as 0000 0100, and -4 is always stored as 1111 1100.

To be clear, my posts are in short-hand. -4 is actually stored as

11111111 11111111 11111111 11111100, but that's a bit cumbersome to explain. (byte) -4, however, is stored as 1111 1100

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 43368

32

posted

0

Welcome to the Ranch

Did you see that I pointed out in those other threads that in two’s complement exactly half the numbers available are negative? 0000_0100 is one of those which are not negative. When you call the binary string method, it misses out the leading 0s, returning "100".

Stephan van Hulst wrote:When we say "Java uses two's complement binary representation", it doesn't mean Java always takes the two's complement of every number. It means positive numbers are stored as usual in the binary system, and negative numbers are stored as the two's complement of their positive counterpart.

4 is always stored as 0000 0100, and -4 is always stored as 1111 1100.

Aha, that's very interesting. In my SCJP-Book is written:

All six number types in Java are made up of a certain number of 8-bit bytes, and are signed, meaning they can be negative or positive. The leftmost bit is used to represent the sign, where a 1 means negative and 0 means positive. The rest of the bits represent the value, using two's complement notation.

There's no clue, that only negative values are represented by the two's complement notation.

Why a byte -4 isn't 1000 0100? Why is it easier(?) to build the 2s complement and not just to invert the leftmost bit of a positive binary to get the negative of it?

@CR:
Yep, I've read the one thread about casting, it helped a little

The book is wrong, or at least misleading. The entire number is written in two's complement representation, not just the remaining bits.

Note that a consequence of two's complement is that *all* negative numbers have the sign bit set, and *all* positive numbers have the sign bit cleared. So regardless of whether you simply flip the first bit, or use two's complement, you will always be able to tell the sign of the value by just looking at the first bit.

An advantage of storing values this way, is that there is only one representation for 0. If you flip only the sign bit for negative values, there are two representations for 0:
0000 0000 and 1000 0000. 0 and -0 respectively, which really are the same value. In two's complement notation, you can represent one more value instead: -128 (for byte values).

A second, and even bigger advantage is that you can subtract values in the way I showed you earlier, by just adding the complement of the subtrahend to the minuend. So you don't have to create a completely different operation for subtraction, you can just reuse the addition. This is what most processors do internally as well.

Another mistake in the book. There are not six primitive number types, but seven. The seventh is called char. Unfortunately that link is not behaving well at the moment.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 43368

32

posted

0

Stephan van Hulst wrote:. . . If you flip only the sign bit for negative values, there are two representations for 0 . . .

There are 32 '1', the first should be the sign-bit, shouldn't it?
Ah nice, valueOf() looks for a '-' in the String, so it doesn't look at the sign bit as described in the method JavaDoc That can't be, something I must do wrong.

Sam Samson wrote:
Ah nice, valueOf() looks for a '-' in the String, so it doesn't look at the sign bit as described in the method JavaDoc That can't be, something I must do wrong.

May we ask... where in the javadoc does it describes this method as processing the sign bit?

You need to look back at valueOf(), and see what it can parse. It refers back to parseInt. Now you can tryYou see, all integer literals are unsigned. So you can pass those 32 1s, which are taken directly into the memory, and when you print it, you get -1.
Similarly hex literals are unsigned, so you can pass 0xffffffff and also get -1.
Similarly decimal literals are unsigned, but they are restricted to the range 0 to 2147483648. If you try to parse your 32 1s, you get 4294967295, which is outside the permissible range for an int. What you can try is this:Obviously without the (int) cast you get 4294967295.