• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

toBinaryString

 
abalfazl hossein
Ranch Hand
Posts: 635
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


The output:

شمس
6
Binary is 1010
11111111111111111111111111011000
11111111111111111111111110110100
11111111111111111111111111011001
11111111111111111111111110000101
11111111111111111111111111011000
11111111111111111111111110110011


1-As you see there many number one in output,Because int is 4 bytes, and UTF 8 is 2 bytes.But at this line:

System.out.println("Binary is " +
Integer.toBinaryString(10));

We don't see many number one in output.Why?

2-How can I change this program so the output became this:

11011000 10110100 ش
11011001 10000101 م


 
Winston Gutkowski
Bartender
Pie
Posts: 9497
50
Eclipse IDE Hibernate Ubuntu
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
abalfazl hossein wrote:1-As you see there many number one in output,Because int is 4 bytes, and UTF 8 is 2 bytes.But at this line:

System.out.println("Binary is " +
Integer.toBinaryString(10));

We don't see many number one in output.Why?

Simply put: because a char (which is what you're dealing with) is NOT an int. And it's NOT a byte either.

2-How can I change this program so the output became this:

Not knowing enough about Arabic characters, and whether or not they involve surrogate pairs, I wouldn't know.

However, my suspicion is that the majority probably don't (I believe they're used by languages with huge character sets; like Chinese), so my general advice would be: stop using ints.

Winston
 
abalfazl hossein
Ranch Hand
Posts: 635
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to show this:
11011000 10110100
11011001 10000101


instead of this:

11111111111111111111111111011000
11111111111111111111111110110100
11111111111111111111111111011001
11111111111111111111111110000101
11111111111111111111111111011000
11111111111111111111111110110011
 
Campbell Ritchie
Sheriff
Pie
Posts: 47300
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
abalfazl hossein wrote: . . .Because int is 4 bytes, and UTF 8 is 2 bytes. . . .
Where on earth did you get that from?
 
abalfazl hossein
Ranch Hand
Posts: 635
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:
abalfazl hossein wrote: . . .Because int is 4 bytes, and UTF 8 is 2 bytes. . . .
Where on earth did you get that from?



UTF-8 consumes two bytes for all non-Latin (Greek, Cyrillic, Arabic, etc.)


http://czyborra.com/utf/



http://javacamp.org/javaI/primitiveTypes.html

32 bit or
4 bytes


May someone answer my last question?
 
Winston Gutkowski
Bartender
Pie
Posts: 9497
50
Eclipse IDE Hibernate Ubuntu
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
abalfazl hossein wrote:May someone answer my last question?

I already have: DON'T use ints.

Winston
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
abalfazl hossein wrote:
Campbell Ritchie wrote:
abalfazl hossein wrote: . . .Because int is 4 bytes, and UTF 8 is 2 bytes. . . .
Where on earth did you get that from?


UTF-8 consumes two bytes for all non-Latin (Greek, Cyrillic, Arabic, etc.)


http://czyborra.com/utf/


Website reference aside, I always thought that UTF 8 is one byte. After all, isn't the "8" in UTF-8 for 8-bits which is one byte?

And based on your output, correct or incorrect, it looks like the Java implementation agrees with me.

Henry
 
Winston Gutkowski
Bartender
Pie
Posts: 9497
50
Eclipse IDE Hibernate Ubuntu
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:Website reference aside, I always thought that UTF 8 is one byte. After all, isn't the "8" in UTF-8 for 8-bits which is one byte?

I think you meant "minimum 1 byte", didn't you?

Winston
 
abalfazl hossein
Ranch Hand
Posts: 635
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
abalfazl hossein wrote:May someone answer my last question?

I already have: DON'T use ints.

Winston


Then What I must use?char?
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
abalfazl hossein wrote:
Winston Gutkowski wrote:
abalfazl hossein wrote:May someone answer my last question?

I already have: DON'T use ints.

Winston


Then What I must use?char?


You have to understand that there are two issues being addressed here. The first is that the UTF-8 getBytes() method is returning 6 bytes, even though you have only 3 characters. You need to find a work around for that.

Second, which this track is trying to address, is that the print out is sign-extending the output (of bytes which are negative). Well, IMO, this second track may be moot, depending on you fix the first issue.

Henry
 
Campbell Ritchie
Sheriff
Pie
Posts: 47300
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
abalfazl hossein wrote: . . . UTF-8 consumes two bytes for all non-Latin (Greek, Cyrillic, Arabic, etc.)

http://czyborra.com/utf/
That does not mean that UTF‑8 is a two‑ byte encoding. It means that Greek writing takes two bytes per letter; if I write in English however, UTF‑8 takes one byte per letter.
http://javacamp.org/javaI/primitiveTypes.html
. . .
Although that website is good about the values of chars, it is quite incorrect about the bits occupied by booleans.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic