This week's book giveaway is in the OCPJP forum.
We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes escaping foreign chars in generating xml, why? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "escaping foreign chars in generating xml, why?" Watch "escaping foreign chars in generating xml, why?" New topic
Author

escaping foreign chars in generating xml, why?

Sven Anderson
Ranch Hand

Joined: Apr 14, 2004
Posts: 58
Hi,

I just started working on a system where foreign characters such as � � � in xml output automatically (using Spring HtmlUtils) are escaped to it's equivalant HTML character reference. I don't really see the point in doing this if you have a db that stores its data as utf-8 and a webserver also serving pages as utf-8.

Are there are advantages/disadvantages using html char references instead of outputting foreign chars just as they are?

Many thanks
E
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18712
    
    8

The advantage is that when you do that, the XML you produce is resistant to being botched up by mis-encoding. You may be carefully ensuring that everything you do is encoded in UTF-8 but that is certainly not a common attitude in the Web world.
Sven Anderson
Ranch Hand

Joined: Apr 14, 2004
Posts: 58
Hi Paul,

Does this mean that if you have an environment where database/web-server successfully serve utf-8 you shouldn't really have to bother with escaping characters and instead rely on the utf-8 encoding and leave the characters as they are?

Thanks
E
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42371
    
  64
Yes, roundtripping of UTF-8 text from DB through web server to browser, back to web server and into the database is possible, and it's not even all that difficult. For starters, make sure that the DB encoding is set to Unicode, and that all pages you serve are declared as UTF-8 encoded.


Ping & DNS - my free Android networking tools app
 
 
subject: escaping foreign chars in generating xml, why?