Originally posted by alfred: its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?
A Java char will always be two bytes in size. It will also always be unsigned.
Stan James
(instanceof Sidekick)
Ranch Hand
Joined: Jan 29, 2003
Posts: 8791
posted
0
Can anyone describe the interfaces between Unicode & ASCII? For example, I can read an ASCII file into Java Unicode strings, and write the strings back to an ASCII file. Are the readers and writers doing the conversion? What if I wanted a file in Unicode?
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
>A Java char will always be two bytes in size. It will also always be unsigned
How can i believe it ? my program refuse it.
output: ======== 6
you claimed [i]"A Java char will always be two bytes in size"[i], so that means i must get 6*2=12 bytes (because there 6 chars in the string and each char takes 2 bytes).
Please change your profile so that your publicly displayed name complies with the JavaRanch Naming Policy. Thanks for your cooperation. [ May 11, 2005: Message edited by: Junilu Lacar ]
Joel McNary
Bartender
Joined: Aug 20, 2001
Posts: 1815
posted
0
Try this code.
It would seem that the .getBytes() method of String does not always properly convert the bytes. (The java doc says: Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array. ) Note that even when a unicode character with a high-order byte is used, the .getBytes() only returned the low-order byte.
That said, you generally don't have to worry about the size of a char; Readers and Writers will handle this for you -- note that the Reader that I used properly returned the character as a two-byte character.
Piscis Babelis est parvus, flavus, et hiridicus, et est probabiliter insolitissima raritas in toto mundo.
suppose , my friend gave me a string and ask me how many bytes it will take ?
what should be my answer ?
side note : ----------- do u think my logic was wrong ? it was a simple mathematics .
Or do u want to tell, what getBytes() method returns is basically wrong , because this method hides the actual result which happens to be 12. and there is no absolute method by which you can calculate the number of bytes.and truly the actual number of bytes is 12. the link you gave was complex rather than my question.
can anybody tell me whats going on ?
Jeremy French
Greenhorn
Joined: May 11, 2005
Posts: 13
posted
0
The number of bytes will always be 2 * the number of characters.
The relationship between ASCII and Unicode is that Unicode creates a much higher number of possible characters. However, the original numbers still carry over I believe, as such, what was character # 126 in ASCII is now character # 000126. (That's not technically correct, but if you apply the principal to binary notation rather than decimal notation, then my statement becomes true. It just adds zeroes to the front to use up 2 bytes.)
There are 10 kinds of people in the world. Those that read binary and those that don't.
Jeremy French
Greenhorn
Joined: May 11, 2005
Posts: 13
posted
0
As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.
alfred jones
Ranch Hand
Joined: Apr 19, 2005
Posts: 279
posted
0
As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.
thats a nice answer.
>The number of bytes will always be 2 * the number of characters.
>The relationship between ASCII and Unicode is that Unicode creates a much higher number of >possible characters.
yea, some odd looking chars(japanese,arabic ? )
>However, the original numbers still carry over I believe, as such, what was character # 126 in >ASCII is now character # 000126. (That's not technically correct, but if you apply the >principal to binary notation rather than decimal notation, then my statement becomes true. It >just adds zeroes to the front to use up 2 bytes.)
OK, so tell me in this example , string "123456"
take out first char i.e "1" what do u call it ? ASCII char or Unicode Char ? i assume you will term this char as Unicode with a imaginary padding up leading zeros. right ?
Now here is the crucial point , if you tell "1" is an ASCII char then you will get 6 because ASCII char will take 1 bytes. but if you tell "1" is a Unicode with your imaginary leading zeros ( and also because its java langunage and java language chars are Unicode ) then it will take 2*6=12 bytes.
so, which one i should think about the char "1" . is it a Unicode char or ASCII char
because whole thing depends upon the decision of its status ?
i assume you will call it a unicode char, so there is actually 12 bytes are taken by this string. but we can not show it by programmatically because we can not have such methods.
but i have a method getBytes() in doc, if i use this method this will thik "1" as ASCII char and will give me result accordingly.
am i right ?
Jeremy French
Greenhorn
Joined: May 11, 2005
Posts: 13
posted
0
No.
getBytes() does not return the number of bytes in a string. It sounds like it does. It doesn't. getBytes() returns an array of bytes(not characters), which, may or may not coincide with the same byte as the character if it were represented in ASCII.
All characters in Java are 2 bytes. Absolutely. All the time. Ignore getBytes(). It's just confusing the issue for you. All characters in Java are 2 bytes.