*
The moose likes Servlets and the fly likes A strange unicode String literal problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Servlets
Bookmark "A strange unicode String literal problem" Watch "A strange unicode String literal problem" New topic
Author

A strange unicode String literal problem

Benjamin Weaver
Ranch Hand

Joined: Apr 08, 2003
Posts: 161
1. If I print out the following unicode in a servlet--

String s = "\u1f26\u0323\u1f82";
out.println(s);

The unicode Greek characters are printed perfectly in HTML.

2. But when I get the unicode String (u1f26\u0323\u1f82) from elsewhere,
that is, I DO NOT initialize String s with the literal string as above,

out.println statement produces, on the HTML page, the
literal string -- \u1f26\u0323\u1f82

In the case of 2, the Unicode code-point values are not "parsed."

Why is this unicode not parsed?

Many thanks in advance!
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12759
    
    5
How are you doing this operation:
But when I get the unicode String (u1f26\u0323\u1f82) from elsewhere
??
If you are using a Reader, I would expect it to do the transformation.
Bill
Benjamin Weaver
Ranch Hand

Joined: Apr 08, 2003
Posts: 161
Bill,

Thanks for taking a shot. Below is some code that drives the point home. Notice the line commented out, in which the String unicode is initialized with a literal string. If uncommented (and the following line commented out) the unicode in utf-8 will be stored perfectly in the file, foo.txt, and displayed correctly when read back from the file. But if the unicode String is read in as a string from a conversion routine(the string is correct), the string is written to the file as the literal unicode sequences, not as UTF-8, and displayed, when read back from the file, as literal sequences.

So, in this (heuristic) example, BufferedWriter does not convert the sequences.

[ July 05, 2004: Message edited by: Jim Yingst ]
Benjamin Weaver
Ranch Hand

Joined: Apr 08, 2003
Posts: 161
Made a mistake in that example code. In the actual servlet I write to the html page using a PrintWriter, not a BufferedWriter as indicated here.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12759
    
    5
When you have this text in your program:
String unicode = "\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26";
The conversion is done by the Reader that the compiler uses to read the source code file. Therefore it is not surprising that your translateToUnicode method does not create the same thing.

Exactly what does that method do? Are you using literal unicode characters, or what?

BufferedWriter certainly does not convert "\uXXXX" - that is a Reader's job.

Bill
Benjamin Weaver
Ranch Hand

Joined: Apr 08, 2003
Posts: 161
Bill,

We're getting closer to the answer, I think. Here's what the translateToUnicode() method does:

1. returns a string of literal unicode sequences, e.g. \u1f82\u1f26\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26

2. #1 is the important fact, but I will explain what the method does. In order to do #1, translateToUnicode() converts a "Betacode" representation of ancient Greek into the unicode character string. Betacode enables users with primitive browsers to input Greek text using Latin ascii characters. For example, a Greek letter "alpha" with an accent mark over it is written, in Betacode "A/". This Betacode has a single or double character unicode equivalent depending on the scheme of unicode normalization. In the normalization scheme we are using, "A/" maps to a single unicode character: \u03AC. The Latin characters input into a textarea on the browser are converted into a unicode sequence of the kind cited above and either stored in a database or returned to the user in a separate html page or in an Applet JTextArea.

I have verified that translateToUnicode() returns a correct unicode sequence. The implementation of this method does not actually write to, then read from, a file--I included that code simply to highlight the conversion problem.


The problem to be solved is how to get Java to convert the unicode sequence (e.g. \u1f26\u1f82\u1f26\u1f82\u1f26 )
into displayable characters, preferably in UTF-8 encoding.
 
Consider Paul's rocket mass heater.
 
subject: A strange unicode String literal problem
 
Similar Threads
comment problem
String literal comparisons - help!
"Unparsed" unicode String in JTextPane
Illegal statement in java
Attaching Style Sheets to Servlets