This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML Parsing with international characters

 
satya kiran
Ranch Hand
Posts: 52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am experiencing a problem while parsing the xml files. I am using jaxp 1.1 . The problem is with international characters.
I am reading a clob from a database and which has the values in the form of XML.
If the values in the tags are some special characters/international characters i am getting errors while parsing. It will be great, if i get some help.
Here is the sample values of xml file:
<ADDRESS>
<LANGUAGE_CD>CH-FR</LANGUAGE_CD>
<LAST_NM>Kaiser</LAST_NM>
<NICKNM>.</NICKNM>
<OFFICIAL_ADDR_1>Centre Informatique</OFFICIAL_ADDR_1>
<OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
</ADDRESS>
I am getting the following error:
Fatal Error: URI=null Line=1: Character conversion error: "Unconvertible UTF-8 character beginning with 0x8f" (line number may be too low).
Here is the sample code i am executing:
factory = SAXParserFactory.newInstance();
// Create a JAXP SAXParser
saxParser = factory.newSAXParser();
// Get the encapsulated SAX XMLReader
xmlReader = saxParser.getXMLReader();
//Setting contenthandler for call back events
xmlReader.setContentHandler(this);
// iterate through the result set "rs" to get the clob and parse it.
Struct messageobj = (Struct) rs.getObject(1)
Clob theClob = (Clob) messageobj.getAttributes();
clobSource = theClob.getSubString(pos,(int)len);
// For handling parser errors and warnings
xmlReader.setErrorHandler(new ParseErrorHandler());
clobStream = new ByteArrayInputStream(parseClob.getBytes());
clobSource = new InputSource(clobStream);
//Parse the inputsource
xmlReader.parse(clobSource);

Thanks in advance for your help
Kiran
 
satya kiran
Ranch Hand
Posts: 52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem is coming with <OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
Thanks,
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To me it looks like your XML document is malformed (contains invalid characters) and that that's what you should fix (probably by encoding those special characters somehow).
 
Mert Nuhoglu
Greenhorn
Posts: 10
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This problem is related to the xml encoding. When using non-ascii characters you should specify which encoding is used in the document.

For instance,

<?xml version="1.0" encoding="ISO-8859-9"?>

The encoding code depends on the locale of your text editor or operating system. You can check it by the system property file.encoding.

From the following page you can obtain more information about xml encoding: http://www.w3schools.com/xml/xml_encoding.asp

Best regards...

Mert Nuhoglu
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic