aspose file tools*
The moose likes XML and Related Technologies and the fly likes UTF8 Exception thrown while parsing XML file using DOM Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "UTF8 Exception thrown while parsing XML file using DOM" Watch "UTF8 Exception thrown while parsing XML file using DOM" New topic
Author

UTF8 Exception thrown while parsing XML file using DOM

G Rao
Greenhorn

Joined: May 12, 2005
Posts: 1
Hi
I have written an application which will use DOM parser to parse a small
XML file.I receive XML webservice output from a remote CRM server, now i take this XML string, write it to a file, and parse it using DOM parser
inorder to retreive content within this XML.Whilst doing this i get the following error


java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.peekChar(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
at com.supportap.blp.ParserUtil.parseXMLContent(ParserUtil.java:128)
at com.supportap.blp.ParserUtil.main(ParserUtil.java:176)

Certain XML results received from CRM server parse without any errors,
while some show the above exception, which means some the XML contents
contain certain unacceptable characters which causes the parser to
throw an acception, or maybe its some UTF8 encoding problem.What should i
do in such a situation , is there is a function which can remove these illegal character within the XML , or is there a way where i can set the correct encoding type?PLEASE HELP.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Moving to XML forum...


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Well, there are a lot of places that the encoding might have gotten screwed up along the way here. It might have been invalid in the first place, or the bytes might have been transferred incorrectly, or there might be a problem with how it's being parsed. However I'm going to guess that the last one is the least likely - an XML file should say what encoding it's in, and xerces probably isn't going to screw up parsing UTF-8. Most likely, the encoding was invalid when you got it, or it was screwed up in transfer. Can you show the code you used to write the file?


"I'm not back." - Bill Harding, Twister
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: UTF8 Exception thrown while parsing XML file using DOM
 
Similar Threads
error while Parsing xml with entity definitions inside it
Exception in XML Parsing
Exception
[Tiles] XML validation
Problems installing xalan-j/xerces-j jar files with Tomcat 6