aspose file tools*
The moose likes XML and Related Technologies and the fly likes XML Parsing with international characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML Parsing with international characters" Watch "XML Parsing with international characters" New topic
Author

XML Parsing with international characters

satya kiran
Ranch Hand

Joined: Nov 07, 2000
Posts: 52
Hi,
I am experiencing a problem while parsing the xml files. I am using jaxp 1.1 . The problem is with international characters.
I am reading a clob from a database and which has the values in the form of XML.
If the values in the tags are some special characters/international characters i am getting errors while parsing. It will be great, if i get some help.
Here is the sample values of xml file:
<ADDRESS>
<LANGUAGE_CD>CH-FR</LANGUAGE_CD>
<LAST_NM>Kaiser</LAST_NM>
<NICKNM>.</NICKNM>
<OFFICIAL_ADDR_1>Centre Informatique</OFFICIAL_ADDR_1>
<OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
</ADDRESS>
I am getting the following error:
Fatal Error: URI=null Line=1: Character conversion error: "Unconvertible UTF-8 character beginning with 0x8f" (line number may be too low).
Here is the sample code i am executing:
factory = SAXParserFactory.newInstance();
// Create a JAXP SAXParser
saxParser = factory.newSAXParser();
// Get the encapsulated SAX XMLReader
xmlReader = saxParser.getXMLReader();
//Setting contenthandler for call back events
xmlReader.setContentHandler(this);
// iterate through the result set "rs" to get the clob and parse it.
Struct messageobj = (Struct) rs.getObject(1)
Clob theClob = (Clob) messageobj.getAttributes();
clobSource = theClob.getSubString(pos,(int)len);
// For handling parser errors and warnings
xmlReader.setErrorHandler(new ParseErrorHandler());
clobStream = new ByteArrayInputStream(parseClob.getBytes());
clobSource = new InputSource(clobStream);
//Parse the inputsource
xmlReader.parse(clobSource);

Thanks in advance for your help
Kiran
satya kiran
Ranch Hand

Joined: Nov 07, 2000
Posts: 52
The problem is coming with <OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
Thanks,
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
To me it looks like your XML document is malformed (contains invalid characters) and that that's what you should fix (probably by encoding those special characters somehow).


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Mert Nuhoglu
Greenhorn

Joined: Apr 22, 2004
Posts: 10
This problem is related to the xml encoding. When using non-ascii characters you should specify which encoding is used in the document.

For instance,

<?xml version="1.0" encoding="ISO-8859-9"?>

The encoding code depends on the locale of your text editor or operating system. You can check it by the system property file.encoding.

From the following page you can obtain more information about xml encoding: http://www.w3schools.com/xml/xml_encoding.asp

Best regards...

Mert Nuhoglu
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: XML Parsing with international characters