Obviously the first step will be to get a readable XML document by locating and fixing all those illegal characters.
How was this document generated? If it got anywhere near Microsoft
word it may be contaminated with those accursed "smart punctuation" characters.
If you catch the SAXParseException try extracting the line and column number to help locate the bad character in the text.
A programmer's editor that can show and patch HEX values will be a big help too.
Bill