File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Handling entity references by XML parsers

 
Dan Drillich
Ranch Hand
Posts: 1183
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good Day,

My beloved book "XML in a nutshell" of O'Reilly says (on page #18) that XML defines five entity references -

- the less-than sign
- the ampersand
- the greater-than sign
- the straight, double quotation marks
- the apostrophe, or single quote

It says that these entity references & a m p ; and & l t ; are considered markup and when an application parses an XML document, it replaces this particular markup with the actual characters the entity reference refers to. It also says that in addition to these five predefined entity references, you can define others in the document type definition.

So my question is - does it mean that all other entity references in the XML document are left intact by the parsers?

Regards,
Dan
 
Paul Clapham
Sheriff
Pie
Posts: 20176
25
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No. If a parser encounters an undeclared entity reference it will throw an exception.
 
Dan Drillich
Ranch Hand
Posts: 1183
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Paul.

Right, but what about all the "standard" HTML Escape Sequences, such é - & eacute ; , ö - & ouml ; , ò - & ograve ; , ñ - & ntilde ; , etc. ?

Regards,
Dan
 
Dan Drillich
Ranch Hand
Posts: 1183
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul,

I guess you are absolutely right! I put one of these entities in a valid XML file and tried to open it with Firefox and IE. Both didn't do it. Firefox even said -

XML Parsing Error: undefined entity


Regards,
Dan
 
Paul Clapham
Sheriff
Pie
Posts: 20176
25
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yup. HTML is not an XML dialect. (Although XHTML is... you will notice that an XHTML document contains a DTD reference at the top.)
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic