I have a program reads a XML file It uses SAX parser .The xml file to be read is validated with a schema .
I am getting errors during this process such as Byte "195" is not a member of the (7-bit) ASCII character set.
But i want to specifically pinpoint to the character that is invalid and the line number of that caharacter
The encodingi am using is ASCII.I may use another encoding and these errors may disappear.But i dont want to do that.
I want to trace out the the characters who breaks the rules
The method i tried was the usage of SAXparse exception generated and getting the info of line number and column number etc But the line numbers i got actually dont have any problem.
1. please use Code tags for code presentation
2. Every time I have used code to get line and column numbers it has worked - how far off is the line number report? If it tags a line before the error you know about perhaps there is an earlier bad character.
3. If the source file has every been near Microsoft Word you may have "smart punctuation" which looks reasonable when you edit the file but is in fact invalid Unicode.
Yes. Ask whoever produced the document to have it declare its encoding correctly. (And make sure that your code doesn't convert the bytes of the document to chars.) It appears from what you posted that the document was encoded in UTF-8, but whatever you used to display it is assuming some other encoding.
Trying to locate the "invalid" characters precisely is most likely a waste of time in this case. I think you just have an encoding problem.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: SAX parsing of xml file Tracing invalid characters