Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Unicode characters - javax . xml . transform . Transformer

 
Ashish Agrawal
Ranch Hand
Posts: 112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using javax.xml.transform.Transformer to convert a Xml String to HTML using a xsl file.
I am getting the following exception --
javax.xml.transform.TransformerException: Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).
The Xml string (packet) is dynamically generated inside a servlet. It has some Thai language data. I have specifed the encoding type as "UTF-8" for both...xml and xsl. I even tried it without any encoding type.
Can any body help me out?
-- Ashish Agrawal.
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you post the piece of code where you create the parameters for transform() and a couple of lines from the beginning of your XML document?
 
Ashish Agrawal
Ranch Hand
Posts: 112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Lasse Koskela:
Could you post the piece of code where you create the parameters for transform() and a couple of lines from the beginning of your XML document?

Servlet Code :
public void doPost(HttpServletRequest req, HttpServletResponse res) {
try{
res.setContentType( "text/html; charset=UTF-8");
PrintWriter out = res.getWriter();

//String strObjectXML = XML Packet from the database.
strObjectXML = new String(strObjectXML.getBytes("UTF-8"));

//create object of Transformer class
DemoTransform objDT = new DemoTransform();
String html = objDT.generateHTML(strObjectXML, xslPath);

out.println(html);
out.close();
}

XML Packet
<?xml version="1.0" encoding="UTF-8" ?>
<root language="thai">
<message_text id="msg001">เครื่องบิน</message_text>
<message_text id="msg002">แล้ว</message_text>
<message_text id="msg003">แล้ว</message_text>
<message_text id="msg004">เก่า</message_text>
<message_text id="msg005">เก่า</message_text>
<message_text id="msg006">เก่า</message_text>
<message_text id="msg007">ด้วย</message_text>
<message_text id="msg008">เก่า</message_text>
<message_text id="msg009">เครื่องบิน</message_text>
<message_text id="msg010">ไม่ดี</message_text>
<message_text id="msg011">� องบินครื่งบินอพาร์ตเมนต์</message_text>
<message_text id="msg012">เก่า</message_text>
<message_text id="msg013">ไม่ดี</message_text>
</root>
DemoTransform.java
public class DemoTransform {
public String generateHTML(String xmlData, String xslPath){
ByteArrayOutputStream baos = new ByteArrayOutputStream();
String html = null;
try{
TransformerFactory myTF = TransformerFactory.newInstance();
Templates tmplXsl = myTF.newTemplates(new StreamSource(new File(xslPath)));
Transformer transformer = tmplXsl.newTransformer();
//read xml Data in a byte array
byte bXml[] = new byte[xmlData.length()];
bXml = xmlData.getBytes();
//create a ByteArrayInputStream on the byte array of xml data
ByteArrayInputStream bais = new ByteArrayInputStream(bXml);
baos = new ByteArrayOutputStream();
StreamResult sr=new StreamResult();
sr.setOutputStream(baos);
transformer.transform(new StreamSource(bais), sr);
html=baos.toString("UTF-8");
baos.close();
bais.close();
}
catch (Exception e){
System.out.println("EXCEPTION = " + e);
e.printStackTrace();
}
return html;
}
[ April 22, 2004: Message edited by: Ashish Agrawal ]
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Although this isn't an answer. One of the things that I like to do when generating XML in code, is to put it into a file and open it in Internet Explorer to make sure that it renders in the browser. This obviously tells me that I have correct closing tags, and if not, which line has it missing, so I can kind of see a cause for it and try to fix it.
What happens if you take your XML and view it in IE?
Mark
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kind of like this

I copied and pasted your XML from your post, and saved it to a file I called test.xml. When I tried to open it in IE I got the above message.
Might be the Copywrite symbol?
Good Luck
Mark
 
Ashish Agrawal
Ranch Hand
Posts: 112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mark Spritzler:
Might be the Copywrite symbol?
Good Luck
Mark

There shouldnt be any problem due to this copyright symbol, coz it is working fine with same file containing english data. I also tried it removing this symbol for thai file, the problem still persists.
Ashish Agrawal.
 
Ashish Agrawal
Ranch Hand
Posts: 112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mark Spritzler:
Although this isn't an answer. One of the things that I like to do when generating XML in code, is to put it into a file and open it in Internet Explorer to make sure that it renders in the browser. This obviously tells me that I have correct closing tags, and if not, which line has it missing, so I can kind of see a cause for it and try to fix it.
What happens if you take your XML and view it in IE?
Mark

There are no problems with the thai characters when displayed in IE. The file is got displayed correctly when opened in IE.
Actually whenever I make a xml packet or file, the first thing I always do is to test it with IE. This is the easiest way to check for its format.
Ashish Agrawal
[ April 22, 2004: Message edited by: Ashish Agrawal ]
 
Ashish Agrawal
Ranch Hand
Posts: 112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Ranchies,
Finally I got the solution.
The problem was with the xml packet itself. Java doesnt support thai, chinese,etc characters directly. I works fine if the characters are first converted into unicode or NCR format. I am storing these into the xml file in NCR format i.e. #&nnnn;
Following code can be used to convert the thai,chinese characters into NCR format -


public class CharToNCR{
public static void main(String args[]){
System.out.println(Character.UnicodeBlock.of('A'));
String str ="เครื่องบินแล้วเก่า"; //Thai Characters
//String str = "用新闻后面保证申请语言可利用" ; //Chinese Characters
System.out.println("NCR " + escapeUnicodeString(str, true));
}
static String escapeUnicodeString(String str, boolean escapeAscii)
{
String ostr = new String();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
if (!escapeAscii && ((ch >= 0x0020) && (ch <= 0x007e)))
ostr += ch ;
else {
ostr+="&#";
String hex = Integer.toString(str.charAt(i) & 0xFFFF);
if (hex.length() == 2)
ostr += "00" ;
ostr+=hex;
}
ostr+=";";
}
return (ostr);
}
}

Similarly to convert the characters in thai, chinese,etc to unicode following code can be used -

public class CharToUnicode{
public static void main(String args[]){
System.out.println(Character.UnicodeBlock.of('A'));
String str ="เครื่องบินแล้วเก่า"; //Thai Characters
//String str = "用新闻后面保证申请语言可利用" ; //Chinese Characters
System.out.println("Unicode " + escapeUnicodeString(str, true));
}
static String escapeUnicodeString(String str, boolean escapeAscii)
{
String ostr = new String();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
if (!escapeAscii && ((ch >= 0x0020) && (ch <= 0x007e)))
ostr += ch ;
else {
ostr += "\\u" ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
if (hex.length() == 2)
ostr += "00" ;
ostr+=hex;
}
}
return (ostr);
}
}

Note : Above code works only when the file is saved in notepad with save as unicode option checked. I am using JEdit to view its output. It is not working in EditPlus/Textpad.

- Ashish Agrawal
[ April 27, 2004: Message edited by: Ashish Agrawal ]
 
Balaji Loganathan
author and deputy
Bartender
Posts: 3150
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for updating!.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic