File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes UTF8 Exception thrown while parsing XML file using DOM Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "UTF8 Exception thrown while parsing XML file using DOM" Watch "UTF8 Exception thrown while parsing XML file using DOM" New topic

UTF8 Exception thrown while parsing XML file using DOM

G Rao

Joined: May 12, 2005
Posts: 1
I have written an application which will use DOM parser to parse a small
XML file.I receive XML webservice output from a remote CRM server, now i take this XML string, write it to a file, and parse it using DOM parser
inorder to retreive content within this XML.Whilst doing this i get the following error Invalid byte 1 of 1-byte UTF-8 sequence.
at Source)
at Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.peekChar(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(
at com.supportap.blp.ParserUtil.parseXMLContent(
at com.supportap.blp.ParserUtil.main(

Certain XML results received from CRM server parse without any errors,
while some show the above exception, which means some the XML contents
contain certain unacceptable characters which causes the parser to
throw an acception, or maybe its some UTF8 encoding problem.What should i
do in such a situation , is there is a function which can remove these illegal character within the XML , or is there a way where i can set the correct encoding type?PLEASE HELP.
Ilja Preuss

Joined: Jul 11, 2001
Posts: 14112
Moving to XML forum...

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
Well, there are a lot of places that the encoding might have gotten screwed up along the way here. It might have been invalid in the first place, or the bytes might have been transferred incorrectly, or there might be a problem with how it's being parsed. However I'm going to guess that the last one is the least likely - an XML file should say what encoding it's in, and xerces probably isn't going to screw up parsing UTF-8. Most likely, the encoding was invalid when you got it, or it was screwed up in transfer. Can you show the code you used to write the file?

"I'm not back." - Bill Harding, Twister
I agree. Here's the link:
subject: UTF8 Exception thrown while parsing XML file using DOM
jQuery in Action, 3rd edition