• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML Parsing error is coming - for non UTF-8 characters

 
Vinod Vijay
Ranch Hand
Posts: 150
Java Tomcat Server Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I have generated a XML file by marshalling given XSD. Overall, XML looks fine, no doubt in that. Because, if I open the same through Eclipse, notepad++ or textpad or editplus, I cannot see any error. All the root and elements are well formed with data in it. But when I try to open the same XML file in Mozilla Firefox or IE, I get XML Parsing Error. On Mozilla it tells me exact line# and column#. When I locate that line through editors, I can see question mark character there like ?. Please refer attached screenshot for more details.
My first question is, why browser is not able to parse it where in the same case, editors can? In my case, I may or may not have Chinese characters in data.
Secondly, can I assume that there is no bug in XML file as I do not see anything wrong with editors(ofcourse yes with browser) and no fix is required as it is not going to break some others team code who are going to load the data by elements.

Please suggest me.
XML.jpg
[Thumbnail for XML.jpg]
XML on Mozilla Firefox
 
Paul Clapham
Sheriff
Posts: 21111
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The answer is: Text editors don't care about the rules of XML. It's possible to create malformed XML with a text editor -- in fact it's very easy, people do it every day. So just because a text editor will read and display your XML, that means nothing. Browsers, on the other hand, do know about XML. So the fact that some browsers tell you that your XML is malformed indicates that... your XML is malformed.

And therefore the answer to your second question is: No, you can't assume that your XML file is well-formed. In fact, some software which knows about XML has told you it isn't. So my advice would be to fix your marshalling code so that the XML document is written out in the encoding which it declares in its header. You didn't show that code but I expect that the problem is there, in particular in the part where you write the document to the file.
 
Dieter Quickfend
Bartender
Posts: 543
4
Java Netbeans IDE Redhat
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check your browser's encoding settings. If Notepad++ is not showing you any errors on UTF-8, and your browser is on UTF-8, it should be fine. Normally, your browser changes charset dynamically using the HTTP header passed by the server it's contacting. If you open the file from Windows... it wasn't specified.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic