This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I'm parsing an XML file that contains some Japanese language (UTF-8 chars). During the parsing, I received an error that says "An invalid XML character (Unicode: 0xb4) was found in the CDATA section." Can someone explain to me how does it possible to have an invalid XML character inside CDATA section? I believe the only restriction inside the CDATA section is including "]]" inside the message. Thank you
Everybody believes so, yet it is a mistake. I think, the confusion stems from many, many sources of XML wisdom, which define CDATA section as "data that are ignored by the parser". If CDATA is ignored, we can put everything there, including binary data? Nothing in XML specification suggest it. "CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup." And if you look at how CDATA is defined, you'll see  CDSect ::= CDStart CData CDEnd  CDStart ::= '<![CDATA['  CData ::= (Char* - (Char* ']]>' Char*))  CDEnd ::= ']]> Where "Char" is in the same range as in any other part of XML document: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ This means that CDATA is different from parsed data only in that the markup is not recognized as such, i.e. not parsed. My understanding is that XML document "physically" can consist of legal characters only; this layer has the highest priority, and high-level constructs like CDATA have to obey the rules. One way to circumvent this rule and to include illegal characters would be to code your data in base64, but this will increase document's size, violate all good design rules etc. etc. [ May 09, 2002: Message edited by: Mapraputa Is ]