Win a copy of Murach's MySQL this week in the JDBC and Relational Databases forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

XML Parsing with international characters

 
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I am experiencing a problem while parsing the xml files. I am using jaxp 1.1 . The problem is with international characters.
I am reading a clob from a database and which has the values in the form of XML.
If the values in the tags are some special characters/international characters i am getting errors while parsing. It will be great, if i get some help.
Here is the sample values of xml file:
<ADDRESS>
<LANGUAGE_CD>CH-FR</LANGUAGE_CD>
<LAST_NM>Kaiser</LAST_NM>
<NICKNM>.</NICKNM>
<OFFICIAL_ADDR_1>Centre Informatique</OFFICIAL_ADDR_1>
<OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
</ADDRESS>
I am getting the following error:
Fatal Error: URI=null Line=1: Character conversion error: "Unconvertible UTF-8 character beginning with 0x8f" (line number may be too low).
Here is the sample code i am executing:
factory = SAXParserFactory.newInstance();
// Create a JAXP SAXParser
saxParser = factory.newSAXParser();
// Get the encapsulated SAX XMLReader
xmlReader = saxParser.getXMLReader();
//Setting contenthandler for call back events
xmlReader.setContentHandler(this);
// iterate through the result set "rs" to get the clob and parse it.
Struct messageobj = (Struct) rs.getObject(1)
Clob theClob = (Clob) messageobj.getAttributes();
clobSource = theClob.getSubString(pos,(int)len);
// For handling parser errors and warnings
xmlReader.setErrorHandler(new ParseErrorHandler());
clobStream = new ByteArrayInputStream(parseClob.getBytes());
clobSource = new InputSource(clobStream);
//Parse the inputsource
xmlReader.parse(clobSource);

Thanks in advance for your help
Kiran
 
satya kiran
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The problem is coming with <OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
Thanks,
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To me it looks like your XML document is malformed (contains invalid characters) and that that's what you should fix (probably by encoding those special characters somehow).
 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This problem is related to the xml encoding. When using non-ascii characters you should specify which encoding is used in the document.

For instance,

<?xml version="1.0" encoding="ISO-8859-9"?>

The encoding code depends on the locale of your text editor or operating system. You can check it by the system property file.encoding.

From the following page you can obtain more information about xml encoding: http://www.w3schools.com/xml/xml_encoding.asp

Best regards...

Mert Nuhoglu
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic