aspose file tools*
The moose likes Java in General and the fly likes toBinaryString Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "toBinaryString" Watch "toBinaryString" New topic
Author

toBinaryString

abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635


The output:

شمس
6
Binary is 1010
11111111111111111111111111011000
11111111111111111111111110110100
11111111111111111111111111011001
11111111111111111111111110000101
11111111111111111111111111011000
11111111111111111111111110110011


1-As you see there many number one in output,Because int is 4 bytes, and UTF 8 is 2 bytes.But at this line:

System.out.println("Binary is " +
Integer.toBinaryString(10));

We don't see many number one in output.Why?

2-How can I change this program so the output became this:

11011000 10110100 ش
11011001 10000101 م


Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

abalfazl hossein wrote:1-As you see there many number one in output,Because int is 4 bytes, and UTF 8 is 2 bytes.But at this line:

System.out.println("Binary is " +
Integer.toBinaryString(10));

We don't see many number one in output.Why?

Simply put: because a char (which is what you're dealing with) is NOT an int. And it's NOT a byte either.

2-How can I change this program so the output became this:

Not knowing enough about Arabic characters, and whether or not they involve surrogate pairs, I wouldn't know.

However, my suspicion is that the majority probably don't (I believe they're used by languages with huge character sets; like Chinese), so my general advice would be: stop using ints.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
I want to show this:
11011000 10110100
11011001 10000101


instead of this:

11111111111111111111111111011000
11111111111111111111111110110100
11111111111111111111111111011001
11111111111111111111111110000101
11111111111111111111111111011000
11111111111111111111111110110011
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39408
    
  28
abalfazl hossein wrote: . . .Because int is 4 bytes, and UTF 8 is 2 bytes. . . .
Where on earth did you get that from?
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
Campbell Ritchie wrote:
abalfazl hossein wrote: . . .Because int is 4 bytes, and UTF 8 is 2 bytes. . . .
Where on earth did you get that from?



UTF-8 consumes two bytes for all non-Latin (Greek, Cyrillic, Arabic, etc.)


http://czyborra.com/utf/



http://javacamp.org/javaI/primitiveTypes.html

32 bit or
4 bytes


May someone answer my last question?
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

abalfazl hossein wrote:May someone answer my last question?

I already have: DON'T use ints.

Winston
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

abalfazl hossein wrote:
Campbell Ritchie wrote:
abalfazl hossein wrote: . . .Because int is 4 bytes, and UTF 8 is 2 bytes. . . .
Where on earth did you get that from?


UTF-8 consumes two bytes for all non-Latin (Greek, Cyrillic, Arabic, etc.)


http://czyborra.com/utf/


Website reference aside, I always thought that UTF 8 is one byte. After all, isn't the "8" in UTF-8 for 8-bits which is one byte?

And based on your output, correct or incorrect, it looks like the Java implementation agrees with me.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Henry Wong wrote:Website reference aside, I always thought that UTF 8 is one byte. After all, isn't the "8" in UTF-8 for 8-bits which is one byte?

I think you meant "minimum 1 byte", didn't you?

Winston
abalfazl hossein
Ranch Hand

Joined: Sep 06, 2007
Posts: 635
Winston Gutkowski wrote:
abalfazl hossein wrote:May someone answer my last question?

I already have: DON'T use ints.

Winston


Then What I must use?char?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

abalfazl hossein wrote:
Winston Gutkowski wrote:
abalfazl hossein wrote:May someone answer my last question?

I already have: DON'T use ints.

Winston


Then What I must use?char?


You have to understand that there are two issues being addressed here. The first is that the UTF-8 getBytes() method is returning 6 bytes, even though you have only 3 characters. You need to find a work around for that.

Second, which this track is trying to address, is that the print out is sign-extending the output (of bytes which are negative). Well, IMO, this second track may be moot, depending on you fix the first issue.

Henry
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39408
    
  28
abalfazl hossein wrote: . . . UTF-8 consumes two bytes for all non-Latin (Greek, Cyrillic, Arabic, etc.)

http://czyborra.com/utf/
That does not mean that UTF‑8 is a two‑ byte encoding. It means that Greek writing takes two bytes per letter; if I write in English however, UTF‑8 takes one byte per letter.
http://javacamp.org/javaI/primitiveTypes.html
. . .
Although that website is good about the values of chars, it is quite incorrect about the bits occupied by booleans.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: toBinaryString