aspose file tools*
The moose likes Java in General and the fly likes Java Utf-16 limitation. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java Utf-16 limitation." Watch "Java Utf-16 limitation." New topic
Author

Java Utf-16 limitation.

Sharon whipple
Ranch Hand

Joined: Jul 31, 2003
Posts: 294
I was wondering how J2ee applications than need's to display Unicode text handle the Utf-16 limitation?
Is there anything new in Java 6?

Thank you
Sharon
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13875
    
  10

What is "the Utf-16 limitation"?


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Sharon whipple
Ranch Hand

Joined: Jul 31, 2003
Posts: 294
java supports only UTF-16 strings.
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Are you imagining that UTF-16 only supports 16 bits worth (65536 max) of characters? If so, that is incorrect.

In a similar way to UTF-8, the thousands of additional characters are handled by escape codes, which introduce additional 16-bit words, to describe the extended characters. I forget exactly how it works, but go look at www.unicode.org.

In Java, one does sometimes have to take care, because some character-related methods report the number of 16-bit Java chars, rather than the number of Unicode characters. Again, I forget exactly how, but go look at the Java String API in detail.

Some code (particularly if it is old) might have trouble in some locales, if it assumes that the number of Java chars is the number of characters.


Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39548
    
  27
The http://faq.javaranch.com/java/JavaIoFaq links to two blog articles on how to deal with characters beyond 16 bit.


Ping & DNS - updated with new look and Ping home screen widget
Sharon whipple
Ranch Hand

Joined: Jul 31, 2003
Posts: 294
To be more precise, the question should be : how do large scale apps use web servers and yet support UTF-8.
Web servers : apache,web-logic,tomcat,jboos,ias etc.
Thank you
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39548
    
  27
Java supports many encodings, UTF-8 and ISO-8859 amongst them. If a program (web browser, database, ...) needs to get text in other encodings out of Java code, that's no problem at all. UTF-16 just happens to be the one in which strings are stored internally. (Come to think of it, I've never seen a web page served in UTF-16, or a database set up to use UTF-16, so if Java couldn't handle other encoding, that would be a major limitation.)
Sharon whipple
Ranch Hand

Joined: Jul 31, 2003
Posts: 294
if Java couldn't handle other encoding, that would be a major limitation.)


http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html
Java is unable to handle UTF-8,
IE can handle Utf-8 encoding, but when html form submitted, the web server internal convert the text to UTF-16, (request/response objects are java UTF-16 Strings)
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13875
    
  10

Originally posted by Sharon whipple:
Java is unable to handle UTF-8,

As already said above, that's not true. Just because Java stores characters in UTF-16 internally does not mean that Java is unable to handle UTF-8. The supported encodings page gives a list of character encodings that Java supports.
Sharon whipple
Ranch Hand

Joined: Jul 31, 2003
Posts: 294
Pure java String class is UTf-16
Web containers/servers build on java are unable to handle UTF-8

Is that correct?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39548
    
  27
Web containers/servers build on java are unable to handle UTF-8

Is that correct?


As both Jesper and I have pointed out, no, that is not correct.
[ October 18, 2007: Message edited by: Ulf Dittmer ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Java Utf-16 limitation.
 
Similar Threads
Internationalization
Unicode CSV file.
"Contains" with UTF-8
Unicode Character
Jboss UTF-8 support