Hi I have written an application which will use DOM parser to parse a small XML file.I receive XML webservice output from a remote CRM server, now i take this XML string, write it to a file, and parse it using DOM parser inorder to retreive content within this XML.Whilst doing this i get the following error
java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence. at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.peekChar(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172) at com.supportap.blp.ParserUtil.parseXMLContent(ParserUtil.java:128) at com.supportap.blp.ParserUtil.main(ParserUtil.java:176)
Certain XML results received from CRM server parse without any errors, while some show the above exception, which means some the XML contents contain certain unacceptable characters which causes the parser to throw an acception, or maybe its some UTF8 encoding problem.What should i do in such a situation , is there is a function which can remove these illegal character within the XML , or is there a way where i can set the correct encoding type?PLEASE HELP.
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Well, there are a lot of places that the encoding might have gotten screwed up along the way here. It might have been invalid in the first place, or the bytes might have been transferred incorrectly, or there might be a problem with how it's being parsed. However I'm going to guess that the last one is the least likely - an XML file should say what encoding it's in, and xerces probably isn't going to screw up parsing UTF-8. Most likely, the encoding was invalid when you got it, or it was screwed up in transfer. Can you show the code you used to write the file?