I just started working on a system where foreign characters such as � � � in xml output automatically (using Spring HtmlUtils) are escaped to it's equivalant HTML character reference. I don't really see the point in doing this if you have a db that stores its data as utf-8 and a webserver also serving pages as utf-8.
Are there are advantages/disadvantages using html char references instead of outputting foreign chars just as they are?
The advantage is that when you do that, the XML you produce is resistant to being botched up by mis-encoding. You may be carefully ensuring that everything you do is encoded in UTF-8 but that is certainly not a common attitude in the Web world.
Joined: Apr 14, 2004
Does this mean that if you have an environment where database/web-server successfully serve utf-8 you shouldn't really have to bother with escaping characters and instead rely on the utf-8 encoding and leave the characters as they are?
Yes, roundtripping of UTF-8 text from DB through web server to browser, back to web server and into the database is possible, and it's not even all that difficult. For starters, make sure that the DB encoding is set to Unicode, and that all pages you serve are declared as UTF-8 encoded.