• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

UTF8 Exception thrown while parsing XML file using DOM

 
G Rao
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
I have written an application which will use DOM parser to parse a small
XML file.I receive XML webservice output from a remote CRM server, now i take this XML string, write it to a file, and parse it using DOM parser
inorder to retreive content within this XML.Whilst doing this i get the following error


java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.peekChar(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
at com.supportap.blp.ParserUtil.parseXMLContent(ParserUtil.java:128)
at com.supportap.blp.ParserUtil.main(ParserUtil.java:176)

Certain XML results received from CRM server parse without any errors,
while some show the above exception, which means some the XML contents
contain certain unacceptable characters which causes the parser to
throw an acception, or maybe its some UTF8 encoding problem.What should i
do in such a situation , is there is a function which can remove these illegal character within the XML , or is there a way where i can set the correct encoding type?PLEASE HELP.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to XML forum...
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, there are a lot of places that the encoding might have gotten screwed up along the way here. It might have been invalid in the first place, or the bytes might have been transferred incorrectly, or there might be a problem with how it's being parsed. However I'm going to guess that the last one is the least likely - an XML file should say what encoding it's in, and xerces probably isn't going to screw up parsing UTF-8. Most likely, the encoding was invalid when you got it, or it was screwed up in transfer. Can you show the code you used to write the file?
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic