File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes XML Parsing with international characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML Parsing with international characters" Watch "XML Parsing with international characters" New topic

XML Parsing with international characters

satya kiran
Ranch Hand

Joined: Nov 07, 2000
Posts: 52
I am experiencing a problem while parsing the xml files. I am using jaxp 1.1 . The problem is with international characters.
I am reading a clob from a database and which has the values in the form of XML.
If the values in the tags are some special characters/international characters i am getting errors while parsing. It will be great, if i get some help.
Here is the sample values of xml file:
<OFFICIAL_ADDR_1>Centre Informatique</OFFICIAL_ADDR_1>
<OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
I am getting the following error:
Fatal Error: URI=null Line=1: Character conversion error: "Unconvertible UTF-8 character beginning with 0x8f" (line number may be too low).
Here is the sample code i am executing:
factory = SAXParserFactory.newInstance();
// Create a JAXP SAXParser
saxParser = factory.newSAXParser();
// Get the encapsulated SAX XMLReader
xmlReader = saxParser.getXMLReader();
//Setting contenthandler for call back events
// iterate through the result set "rs" to get the clob and parse it.
Struct messageobj = (Struct) rs.getObject(1)
Clob theClob = (Clob) messageobj.getAttributes();
clobSource = theClob.getSubString(pos,(int)len);
// For handling parser errors and warnings
xmlReader.setErrorHandler(new ParseErrorHandler());
clobStream = new ByteArrayInputStream(parseClob.getBytes());
clobSource = new InputSource(clobStream);
//Parse the inputsource

Thanks in advance for your help
satya kiran
Ranch Hand

Joined: Nov 07, 2000
Posts: 52
The problem is coming with <OFFICIAL_ADDR_2>Coll<caron>��ge Prog<caron>��deutique 2</OFFICIAL_ADDR_2>
Lasse Koskela

Joined: Jan 23, 2002
Posts: 11962
To me it looks like your XML document is malformed (contains invalid characters) and that that's what you should fix (probably by encoding those special characters somehow).

Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Mert Nuhoglu

Joined: Apr 22, 2004
Posts: 10
This problem is related to the xml encoding. When using non-ascii characters you should specify which encoding is used in the document.

For instance,

<?xml version="1.0" encoding="ISO-8859-9"?>

The encoding code depends on the locale of your text editor or operating system. You can check it by the system property file.encoding.

From the following page you can obtain more information about xml encoding:

Best regards...

Mert Nuhoglu
I agree. Here's the link:
subject: XML Parsing with international characters
It's not a secret anymore!