This week's book giveaway is in the Agile and Other Processes forum. We're giving away four copies of Darcy DeClute's Scrum Master Certification Guide: The Definitive Resource for Passing the CSM and PSM Exams and have Darcy DeClute on-line! See this thread for details.
Thanks for taking a shot. Below is some code that drives the point home. Notice the line commented out, in which the String unicode is initialized with a literal string. If uncommented (and the following line commented out) the unicode in utf-8 will be stored perfectly in the file, foo.txt, and displayed correctly when read back from the file. But if the unicode String is read in as a string from a conversion routine(the string is correct), the string is written to the file as the literal unicode sequences, not as UTF-8, and displayed, when read back from the file, as literal sequences.
So, in this (heuristic) example, BufferedWriter does not convert the sequences.
When you have this text in your program: String unicode = "\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26"; The conversion is done by the Reader that the compiler uses to read the source code file. Therefore it is not surprising that your translateToUnicode method does not create the same thing.
Exactly what does that method do? Are you using literal unicode characters, or what?
BufferedWriter certainly does not convert "\uXXXX" - that is a Reader's job.
We're getting closer to the answer, I think. Here's what the translateToUnicode() method does:
1. returns a string of literal unicode sequences, e.g. \u1f82\u1f26\u1f82\u1f26\u1f82\u1f26\u1f82\u1f26
2. #1 is the important fact, but I will explain what the method does. In order to do #1, translateToUnicode() converts a "Betacode" representation of ancient Greek into the unicode character string. Betacode enables users with primitive browsers to input Greek text using Latin ascii characters. For example, a Greek letter "alpha" with an accent mark over it is written, in Betacode "A/". This Betacode has a single or double character unicode equivalent depending on the scheme of unicode normalization. In the normalization scheme we are using, "A/" maps to a single unicode character: \u03AC. The Latin characters input into a textarea on the browser are converted into a unicode sequence of the kind cited above and either stored in a database or returned to the user in a separate html page or in an Applet JTextArea.
I have verified that translateToUnicode() returns a correct unicode sequence. The implementation of this method does not actually write to, then read from, a file--I included that code simply to highlight the conversion problem.
The problem to be solved is how to get Java to convert the unicode sequence (e.g. \u1f26\u1f82\u1f26\u1f82\u1f26 ) into displayable characters, preferably in UTF-8 encoding.
permaculture is giving a gift to your future self. After reading this tiny ad:
a bit of art, as a gift, the permaculture playing cards