But when I get the unicode String (u1f26\u0323\u1f82) from elsewhere
?? If you are using a Reader, I would expect it to do the transformation. Bill
Joined: Apr 08, 2003
Thanks for taking a shot. Below is some code that drives the point home. Notice the line commented out, in which the String unicode is initialized with a literal string. If uncommented (and the following line commented out) the unicode in utf-8 will be stored perfectly in the file, foo.txt, and displayed correctly when read back from the file. But if the unicode String is read in as a string from a conversion routine(the string is correct), the string is written to the file as the literal unicode sequences, not as UTF-8, and displayed, when read back from the file, as literal sequences.
So, in this (heuristic) example, BufferedWriter does not convert the sequences.
Author and all-around good cowpoke
Joined: Mar 22, 2000
When you have this text in your program: String unicode = "\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26"; The conversion is done by the Reader that the compiler uses to read the source code file. Therefore it is not surprising that your translateToUnicode method does not create the same thing.
Exactly what does that method do? Are you using literal unicode characters, or what?
BufferedWriter certainly does not convert "\uXXXX" - that is a Reader's job.
Joined: Apr 08, 2003
We're getting closer to the answer, I think. Here's what the translateToUnicode() method does:
1. returns a string of literal unicode sequences, e.g. \u1f82\u1f26\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26
2. #1 is the important fact, but I will explain what the method does. In order to do #1, translateToUnicode() converts a "Betacode" representation of ancient Greek into the unicode character string. Betacode enables users with primitive browsers to input Greek text using Latin ascii characters. For example, a Greek letter "alpha" with an accent mark over it is written, in Betacode "A/". This Betacode has a single or double character unicode equivalent depending on the scheme of unicode normalization. In the normalization scheme we are using, "A/" maps to a single unicode character: \u03AC. The Latin characters input into a textarea on the browser are converted into a unicode sequence of the kind cited above and either stored in a database or returned to the user in a separate html page or in an AppletJTextArea.
I have verified that translateToUnicode() returns a correct unicode sequence. The implementation of this method does not actually write to, then read from, a file--I included that code simply to highlight the conversion problem.
The problem to be solved is how to get Java to convert the unicode sequence (e.g. \u1f26\u1f82\u1f26\u1f82\u1f26 ) into displayable characters, preferably in UTF-8 encoding.