This week's book giveaway is in the Mac OS forum.
We're giving away four copies of a choice of "Take Control of Upgrading to Yosemite" or "Take Control of Automating Your Mac" and have Joe Kissell on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes ASCII  characters  AND  Java characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Beginning Java
Bookmark "ASCII  characters  AND  Java characters" Watch "ASCII  characters  AND  Java characters" New topic
Author

ASCII characters AND Java characters

alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
what is the difference between ASCII characters AND Java characters ?

ASCII characters takes 1 byte but Java characters takes 2 bytes. why there is 2 type of characters?
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?
Steve Morrow
Ranch Hand

Joined: May 22, 2003
Posts: 657

There are many different ways of encoding characters. Java happens to use Unicode (16-bit version).
Steve Morrow
Ranch Hand

Joined: May 22, 2003
Posts: 657

Originally posted by alfred:
its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?

A Java char will always be two bytes in size. It will also always be unsigned.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Can anyone describe the interfaces between Unicode & ASCII? For example, I can read an ASCII file into Java Unicode strings, and write the strings back to an ASCII file. Are the readers and writers doing the conversion? What if I wanted a file in Unicode?


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Steve Morrow
Ranch Hand

Joined: May 22, 2003
Posts: 657

http://java.sun.com/docs/books/tutorial/essential/io/filestreams.html
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
>A Java char will always be two bytes in size. It will also always be unsigned

How can i believe it ? my program refuse it.





output:
========
6


you claimed [i]"A Java char will always be two bytes in size"[i], so that means i must get 6*2=12 bytes (because there 6 chars in the string and each char takes 2 bytes).

so you are wrong.
Junilu Lacar
Bartender

Joined: Feb 26, 2001
Posts: 4477
    
    6

See this: http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html


Junilu - [How to Ask Questions] [How to Answer Questions]
Junilu Lacar
Bartender

Joined: Feb 26, 2001
Posts: 4477
    
    6

"alfred"

Please change your profile so that your publicly displayed name complies with the JavaRanch Naming Policy. Thanks for your cooperation.
[ May 11, 2005: Message edited by: Junilu Lacar ]
Joel McNary
Bartender

Joined: Aug 20, 2001
Posts: 1817


Try this code.

It would seem that the .getBytes() method of String does not always properly convert the bytes. (The java doc says: Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array. ) Note that even when a unicode character with a high-order byte is used, the .getBytes() only returned the low-order byte.

That said, you generally don't have to worry about the size of a char; Readers and Writers will handle this for you -- note that the Reader that I used properly returned the character as a two-byte character.


Piscis Babelis est parvus, flavus, et hiridicus, et est probabiliter insolitissima raritas in toto mundo.
Joel McNary
Bartender

Joined: Aug 20, 2001
Posts: 1817

alfred:

Welcome to JavaRanch! Please take a moment to read the JavaRanch naming policy and then please change your display name to comply. (We are looking for first and last names that are not obviously fictitious).

Thanks!
M Beck
Ranch Hand

Joined: Jan 14, 2005
Posts: 323
quick question for Java language lawyers and/or implementation specialists:

what does Java do with Unicode code points that won't fit into 16 bits? is Java's "unicode" really UTF-16, or what?
Steve Morrow
Ranch Hand

Joined: May 22, 2003
Posts: 657

Supplementary Characters in the Java Platform
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
i am really very much confused.

suppose , my friend gave me a string and ask me how many bytes it will take ?

what should be my answer ?



side note :
-----------
do u think my logic was wrong ? it was a simple mathematics .

Or do u want to tell, what getBytes() method returns is basically wrong , because this method hides the actual result which happens to be 12. and there is no absolute method by which you can calculate the number of bytes.and truly the actual number of bytes is 12. the link you gave was complex rather than my question.

can anybody tell me whats going on ?
Jeremy French
Greenhorn

Joined: May 11, 2005
Posts: 13
The number of bytes will always be 2 * the number of characters.

The relationship between ASCII and Unicode is that Unicode creates a much higher number of possible characters. However, the original numbers still carry over I believe, as such, what was character # 126 in ASCII is now character # 000126. (That's not technically correct, but if you apply the principal to binary notation rather than decimal notation, then my statement becomes true. It just adds zeroes to the front to use up 2 bytes.)


There are 10 kinds of people in the world. Those that read binary and those that don't.
Jeremy French
Greenhorn

Joined: May 11, 2005
Posts: 13
As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279

As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.


thats a nice answer.






>The number of bytes will always be 2 * the number of characters.

>The relationship between ASCII and Unicode is that Unicode creates a much higher number of >possible characters.

yea, some odd looking chars(japanese,arabic ? )



>However, the original numbers still carry over I believe, as such, what was character # 126 in >ASCII is now character # 000126. (That's not technically correct, but if you apply the >principal to binary notation rather than decimal notation, then my statement becomes true. It >just adds zeroes to the front to use up 2 bytes.)


OK,
so tell me in this example , string "123456"

take out first char i.e "1" what do u call it ? ASCII char or Unicode Char ? i assume you will term this char as Unicode with a imaginary padding up leading zeros. right ?


Now here is the crucial point , if you tell "1" is an ASCII char then you will get 6 because ASCII char will take 1 bytes. but if you tell "1" is a Unicode with your imaginary leading zeros ( and also because its java langunage and java language chars are Unicode ) then it will take 2*6=12 bytes.


so, which one i should think about the char "1" . is it a Unicode char or ASCII char

because whole thing depends upon the decision of its status ?




i assume you will call it a unicode char, so there is actually 12 bytes are taken by this string. but we can not show it by programmatically because we can not have such methods.



but i have a method getBytes() in doc, if i use this method this will thik "1" as ASCII char and will give me result accordingly.


am i right ?
Jeremy French
Greenhorn

Joined: May 11, 2005
Posts: 13
No.

getBytes() does not return the number of bytes in a string. It sounds like it does. It doesn't. getBytes() returns an array of bytes(not characters), which, may or may not coincide with the same byte as the character if it were represented in ASCII.

All characters in Java are 2 bytes. Absolutely. All the time. Ignore getBytes(). It's just confusing the issue for you. All characters in Java are 2 bytes.
 
GeeCON Prague 2014
 
subject: ASCII characters AND Java characters