This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Hi, I have generated a XML file by marshalling given XSD. Overall, XML looks fine, no doubt in that. Because, if I open the same through Eclipse, notepad++ or textpad or editplus, I cannot see any error. All the root and elements are well formed with data in it. But when I try to open the same XML file in Mozilla Firefox or IE, I get XML Parsing Error. On Mozilla it tells me exact line# and column#. When I locate that line through editors, I can see question mark character there like ?. Please refer attached screenshot for more details.
My first question is, why browser is not able to parse it where in the same case, editors can? In my case, I may or may not have Chinese characters in data.
Secondly, can I assume that there is no bug in XML file as I do not see anything wrong with editors(ofcourse yes with browser) and no fix is required as it is not going to break some others team code who are going to load the data by elements.
The answer is: Text editors don't care about the rules of XML. It's possible to create malformed XML with a text editor -- in fact it's very easy, people do it every day. So just because a text editor will read and display your XML, that means nothing. Browsers, on the other hand, do know about XML. So the fact that some browsers tell you that your XML is malformed indicates that... your XML is malformed.
And therefore the answer to your second question is: No, you can't assume that your XML file is well-formed. In fact, some software which knows about XML has told you it isn't. So my advice would be to fix your marshalling code so that the XML document is written out in the encoding which it declares in its header. You didn't show that code but I expect that the problem is there, in particular in the part where you write the document to the file.
Check your browser's encoding settings. If Notepad++ is not showing you any errors on UTF-8, and your browser is on UTF-8, it should be fine. Normally, your browser changes charset dynamically using the HTTP header passed by the server it's contacting. If you open the file from Windows... it wasn't specified.