File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes XML Parsing error is coming - for non UTF-8 characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML Parsing error is coming - for non UTF-8 characters" Watch "XML Parsing error is coming - for non UTF-8 characters" New topic

XML Parsing error is coming - for non UTF-8 characters

Vinod Vijay
Ranch Hand

Joined: Sep 13, 2011
Posts: 146

Hi, I have generated a XML file by marshalling given XSD. Overall, XML looks fine, no doubt in that. Because, if I open the same through Eclipse, notepad++ or textpad or editplus, I cannot see any error. All the root and elements are well formed with data in it. But when I try to open the same XML file in Mozilla Firefox or IE, I get XML Parsing Error. On Mozilla it tells me exact line# and column#. When I locate that line through editors, I can see question mark character there like ?. Please refer attached screenshot for more details.
My first question is, why browser is not able to parse it where in the same case, editors can? In my case, I may or may not have Chinese characters in data.
Secondly, can I assume that there is no bug in XML file as I do not see anything wrong with editors(ofcourse yes with browser) and no fix is required as it is not going to break some others team code who are going to load the data by elements.

Please suggest me.

[Thumbnail for XML.jpg]

Vinod Vijay Nair
Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

The answer is: Text editors don't care about the rules of XML. It's possible to create malformed XML with a text editor -- in fact it's very easy, people do it every day. So just because a text editor will read and display your XML, that means nothing. Browsers, on the other hand, do know about XML. So the fact that some browsers tell you that your XML is malformed indicates that... your XML is malformed.

And therefore the answer to your second question is: No, you can't assume that your XML file is well-formed. In fact, some software which knows about XML has told you it isn't. So my advice would be to fix your marshalling code so that the XML document is written out in the encoding which it declares in its header. You didn't show that code but I expect that the problem is there, in particular in the part where you write the document to the file.
Dieter Quickfend

Joined: Aug 06, 2010
Posts: 543

Check your browser's encoding settings. If Notepad++ is not showing you any errors on UTF-8, and your browser is on UTF-8, it should be fine. Normally, your browser changes charset dynamically using the HTTP header passed by the server it's contacting. If you open the file from Windows... it wasn't specified.

Oracle Certified Professional: Java SE 6 Programmer && Oracle Certified Expert: (JEE 6 Web Component Developer && JEE 6 EJB Developer)
I agree. Here's the link:
subject: XML Parsing error is coming - for non UTF-8 characters
It's not a secret anymore!