wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes Unicode characters - javax . xml . transform . Transformer Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Unicode characters - javax . xml . transform . Transformer " Watch "Unicode characters - javax . xml . transform . Transformer " New topic
Author

Unicode characters - javax . xml . transform . Transformer

Ashish Agrawal
Ranch Hand

Joined: Nov 02, 2003
Posts: 112
I am using javax.xml.transform.Transformer to convert a Xml String to HTML using a xsl file.
I am getting the following exception --
javax.xml.transform.TransformerException: Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).
The Xml string (packet) is dynamically generated inside a servlet. It has some Thai language data. I have specifed the encoding type as "UTF-8" for both...xml and xsl. I even tried it without any encoding type.
Can any body help me out?
-- Ashish Agrawal.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Could you post the piece of code where you create the parameters for transform() and a couple of lines from the beginning of your XML document?


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Ashish Agrawal
Ranch Hand

Joined: Nov 02, 2003
Posts: 112
Originally posted by Lasse Koskela:
Could you post the piece of code where you create the parameters for transform() and a couple of lines from the beginning of your XML document?

Servlet Code :
public void doPost(HttpServletRequest req, HttpServletResponse res) {
try{
res.setContentType( "text/html; charset=UTF-8");
PrintWriter out = res.getWriter();

//String strObjectXML = XML Packet from the database.
strObjectXML = new String(strObjectXML.getBytes("UTF-8"));

//create object of Transformer class
DemoTransform objDT = new DemoTransform();
String html = objDT.generateHTML(strObjectXML, xslPath);

out.println(html);
out.close();
}

XML Packet
<?xml version="1.0" encoding="UTF-8" ?>
<root language="thai">
<message_text id="msg001">เครื่องบิน</message_text>
<message_text id="msg002">แล้ว</message_text>
<message_text id="msg003">แล้ว</message_text>
<message_text id="msg004">เก่า</message_text>
<message_text id="msg005">เก่า</message_text>
<message_text id="msg006">เก่า</message_text>
<message_text id="msg007">ด้วย</message_text>
<message_text id="msg008">เก่า</message_text>
<message_text id="msg009">เครื่องบิน</message_text>
<message_text id="msg010">ไม่ดี</message_text>
<message_text id="msg011">� องบินครื่งบินอพาร์ตเมนต์</message_text>
<message_text id="msg012">เก่า</message_text>
<message_text id="msg013">ไม่ดี</message_text>
</root>
DemoTransform.java
public class DemoTransform {
public String generateHTML(String xmlData, String xslPath){
ByteArrayOutputStream baos = new ByteArrayOutputStream();
String html = null;
try{
TransformerFactory myTF = TransformerFactory.newInstance();
Templates tmplXsl = myTF.newTemplates(new StreamSource(new File(xslPath)));
Transformer transformer = tmplXsl.newTransformer();
//read xml Data in a byte array
byte bXml[] = new byte[xmlData.length()];
bXml = xmlData.getBytes();
//create a ByteArrayInputStream on the byte array of xml data
ByteArrayInputStream bais = new ByteArrayInputStream(bXml);
baos = new ByteArrayOutputStream();
StreamResult sr=new StreamResult();
sr.setOutputStream(baos);
transformer.transform(new StreamSource(bais), sr);
html=baos.toString("UTF-8");
baos.close();
bais.close();
}
catch (Exception e){
System.out.println("EXCEPTION = " + e);
e.printStackTrace();
}
return html;
}
[ April 22, 2004: Message edited by: Ashish Agrawal ]
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

Although this isn't an answer. One of the things that I like to do when generating XML in code, is to put it into a file and open it in Internet Explorer to make sure that it renders in the browser. This obviously tells me that I have correct closing tags, and if not, which line has it missing, so I can kind of see a cause for it and try to fix it.
What happens if you take your XML and view it in IE?
Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17250
    
    6

Kind of like this

I copied and pasted your XML from your post, and saved it to a file I called test.xml. When I tried to open it in IE I got the above message.
Might be the Copywrite symbol?
Good Luck
Mark
Ashish Agrawal
Ranch Hand

Joined: Nov 02, 2003
Posts: 112
Originally posted by Mark Spritzler:
Might be the Copywrite symbol?
Good Luck
Mark

There shouldnt be any problem due to this copyright symbol, coz it is working fine with same file containing english data. I also tried it removing this symbol for thai file, the problem still persists.
Ashish Agrawal.
Ashish Agrawal
Ranch Hand

Joined: Nov 02, 2003
Posts: 112
Originally posted by Mark Spritzler:
Although this isn't an answer. One of the things that I like to do when generating XML in code, is to put it into a file and open it in Internet Explorer to make sure that it renders in the browser. This obviously tells me that I have correct closing tags, and if not, which line has it missing, so I can kind of see a cause for it and try to fix it.
What happens if you take your XML and view it in IE?
Mark

There are no problems with the thai characters when displayed in IE. The file is got displayed correctly when opened in IE.
Actually whenever I make a xml packet or file, the first thing I always do is to test it with IE. This is the easiest way to check for its format.
Ashish Agrawal
[ April 22, 2004: Message edited by: Ashish Agrawal ]
Ashish Agrawal
Ranch Hand

Joined: Nov 02, 2003
Posts: 112
Hello Ranchies,
Finally I got the solution.
The problem was with the xml packet itself. Java doesnt support thai, chinese,etc characters directly. I works fine if the characters are first converted into unicode or NCR format. I am storing these into the xml file in NCR format i.e. #&nnnn;
Following code can be used to convert the thai,chinese characters into NCR format -


public class CharToNCR{
public static void main(String args[]){
System.out.println(Character.UnicodeBlock.of('A'));
String str ="เครื่องบินแล้วเก่า"; //Thai Characters
//String str = "用新闻后面保证申请语言可利用" ; //Chinese Characters
System.out.println("NCR " + escapeUnicodeString(str, true));
}
static String escapeUnicodeString(String str, boolean escapeAscii)
{
String ostr = new String();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
if (!escapeAscii && ((ch >= 0x0020) && (ch <= 0x007e)))
ostr += ch ;
else {
ostr+="&#";
String hex = Integer.toString(str.charAt(i) & 0xFFFF);
if (hex.length() == 2)
ostr += "00" ;
ostr+=hex;
}
ostr+=";";
}
return (ostr);
}
}

Similarly to convert the characters in thai, chinese,etc to unicode following code can be used -

public class CharToUnicode{
public static void main(String args[]){
System.out.println(Character.UnicodeBlock.of('A'));
String str ="เครื่องบินแล้วเก่า"; //Thai Characters
//String str = "用新闻后面保证申请语言可利用" ; //Chinese Characters
System.out.println("Unicode " + escapeUnicodeString(str, true));
}
static String escapeUnicodeString(String str, boolean escapeAscii)
{
String ostr = new String();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
if (!escapeAscii && ((ch >= 0x0020) && (ch <= 0x007e)))
ostr += ch ;
else {
ostr += "\\u" ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
if (hex.length() == 2)
ostr += "00" ;
ostr+=hex;
}
}
return (ostr);
}
}

Note : Above code works only when the file is saved in notepad with save as unicode option checked. I am using JEdit to view its output. It is not working in EditPlus/Textpad.

- Ashish Agrawal
[ April 27, 2004: Message edited by: Ashish Agrawal ]
Balaji Loganathan
author and deputy
Bartender

Joined: Jul 13, 2001
Posts: 3150
Thanks for updating!.
 
GeeCON Prague 2014
 
subject: Unicode characters - javax . xml . transform . Transformer