• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

ASCII characters AND Java characters

 
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
what is the difference between ASCII characters AND Java characters ?

ASCII characters takes 1 byte but Java characters takes 2 bytes. why there is 2 type of characters?
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?
 
Ranch Hand
Posts: 657
Spring VI Editor Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are many different ways of encoding characters. Java happens to use Unicode (16-bit version).
 
Steve Morrow
Ranch Hand
Posts: 657
Spring VI Editor Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by alfred:
its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?


A Java char will always be two bytes in size. It will also always be unsigned.
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can anyone describe the interfaces between Unicode & ASCII? For example, I can read an ASCII file into Java Unicode strings, and write the strings back to an ASCII file. Are the readers and writers doing the conversion? What if I wanted a file in Unicode?
 
Steve Morrow
Ranch Hand
Posts: 657
Spring VI Editor Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
http://java.sun.com/docs/books/tutorial/essential/io/filestreams.html
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
>A Java char will always be two bytes in size. It will also always be unsigned

How can i believe it ? my program refuse it.





output:
========
6


you claimed [i]"A Java char will always be two bytes in size"[i], so that means i must get 6*2=12 bytes (because there 6 chars in the string and each char takes 2 bytes).

so you are wrong.
 
Sheriff
Posts: 17644
300
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
See this: http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html
 
Junilu Lacar
Sheriff
Posts: 17644
300
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"alfred"

Please change your profile so that your publicly displayed name complies with the JavaRanch Naming Policy. Thanks for your cooperation.
[ May 11, 2005: Message edited by: Junilu Lacar ]
 
Bartender
Posts: 1844
Eclipse IDE Ruby Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Try this code.

It would seem that the .getBytes() method of String does not always properly convert the bytes. (The java doc says: Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array. ) Note that even when a unicode character with a high-order byte is used, the .getBytes() only returned the low-order byte.

That said, you generally don't have to worry about the size of a char; Readers and Writers will handle this for you -- note that the Reader that I used properly returned the character as a two-byte character.
 
Joel McNary
Bartender
Posts: 1844
Eclipse IDE Ruby Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
alfred:

Welcome to JavaRanch! Please take a moment to read the JavaRanch naming policy and then please change your display name to comply. (We are looking for first and last names that are not obviously fictitious).

Thanks!
 
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
quick question for Java language lawyers and/or implementation specialists:

what does Java do with Unicode code points that won't fit into 16 bits? is Java's "unicode" really UTF-16, or what?
 
Steve Morrow
Ranch Hand
Posts: 657
Spring VI Editor Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Supplementary Characters in the Java Platform
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
i am really very much confused.

suppose , my friend gave me a string and ask me how many bytes it will take ?

what should be my answer ?



side note :
-----------
do u think my logic was wrong ? it was a simple mathematics .

Or do u want to tell, what getBytes() method returns is basically wrong , because this method hides the actual result which happens to be 12. and there is no absolute method by which you can calculate the number of bytes.and truly the actual number of bytes is 12. the link you gave was complex rather than my question.

can anybody tell me whats going on ?
 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The number of bytes will always be 2 * the number of characters.

The relationship between ASCII and Unicode is that Unicode creates a much higher number of possible characters. However, the original numbers still carry over I believe, as such, what was character # 126 in ASCII is now character # 000126. (That's not technically correct, but if you apply the principal to binary notation rather than decimal notation, then my statement becomes true. It just adds zeroes to the front to use up 2 bytes.)
 
Jeremy French
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.


thats a nice answer.






>The number of bytes will always be 2 * the number of characters.

>The relationship between ASCII and Unicode is that Unicode creates a much higher number of >possible characters.

yea, some odd looking chars(japanese,arabic ? )



>However, the original numbers still carry over I believe, as such, what was character # 126 in >ASCII is now character # 000126. (That's not technically correct, but if you apply the >principal to binary notation rather than decimal notation, then my statement becomes true. It >just adds zeroes to the front to use up 2 bytes.)


OK,
so tell me in this example , string "123456"

take out first char i.e "1" what do u call it ? ASCII char or Unicode Char ? i assume you will term this char as Unicode with a imaginary padding up leading zeros. right ?


Now here is the crucial point , if you tell "1" is an ASCII char then you will get 6 because ASCII char will take 1 bytes. but if you tell "1" is a Unicode with your imaginary leading zeros ( and also because its java langunage and java language chars are Unicode ) then it will take 2*6=12 bytes.


so, which one i should think about the char "1" . is it a Unicode char or ASCII char

because whole thing depends upon the decision of its status ?




i assume you will call it a unicode char, so there is actually 12 bytes are taken by this string. but we can not show it by programmatically because we can not have such methods.



but i have a method getBytes() in doc, if i use this method this will thik "1" as ASCII char and will give me result accordingly.


am i right ?
 
Jeremy French
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No.

getBytes() does not return the number of bytes in a string. It sounds like it does. It doesn't. getBytes() returns an array of bytes(not characters), which, may or may not coincide with the same byte as the character if it were represented in ASCII.

All characters in Java are 2 bytes. Absolutely. All the time. Ignore getBytes(). It's just confusing the issue for you. All characters in Java are 2 bytes.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic