Here are some rules about what characters are not allowed. Basically
you should escape '<', '&', '>' if it's part of "]]>" but not intended to delimit a CDATA section, and single and double quotes if they're inside attribute values that are delimited with the same type of quote. To simplify - if you
always escape &<>'" when they occur in text, you'll be safe.
As for other characters: assuming that the characters are valid Unicode, they're allowed, except fot the ones on
this list.
As far as non-Unicode characters - well they're only allowed if you explicitly declare the encoding. And even then, you're not guaranteed that a parser will understand them. The most common problem characters come from Microsoft's Cp-1252 - they're the characters shown in yellow on
this chart.