File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes SAX parsing of xml file Tracing invalid characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "SAX parsing of xml file Tracing invalid characters" Watch "SAX parsing of xml file Tracing invalid characters" New topic
Author

SAX parsing of xml file Tracing invalid characters

Abhi Venu
Ranch Hand

Joined: Jul 09, 2009
Posts: 73
Hai ,
I have a program reads a XML file It uses SAX parser .The xml file to be read is validated with a schema .
I am getting errors during this process such as Byte "195" is not a member of the (7-bit) ASCII character set.
But i want to specifically pinpoint to the character that is invalid and the line number of that caharacter
The encodingi am using is ASCII.I may use another encoding and these errors may disappear.But i dont want to do that.
I want to trace out the the characters who breaks the rules

The method i tried was the usage of SAXparse exception generated and getting the info of line number and column number etc But the line numbers i got actually dont have any problem.

public void fatalError(SAXParseException exception) throws SAXException {
validationError = true;
saxParseException = exception;
String message = "Fatal Error: " + getParseExceptionInfo(exception);
writeToErrorFile(exception+"Fatal error occured at "+exception.getLineNumber()) ;
}


can any one suggest please how this can be done. Relevant parts of mcode are posted here

DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
factory.setAttribute(
"http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
factory.setAttribute(
"http://java.sun.com/xml/jaxp/properties/schemaSource",
SchemaUrl);
DocumentBuilder builder = factory.newDocumentBuilder();
Validator handler = new Validator();
builder.setErrorHandler(handler);
builder.parse(XmlDocumentUrl);


private class Validator extends DefaultHandler {
public boolean validationError = false;

public SAXParseException saxParseException = null;

public void error(SAXParseException exception) throws SAXException {
validationError = true;
saxParseException = exception;
String message = "Error: " + getParseExceptionInfo(exception);
}

public void fatalError(SAXParseException exception) throws SAXException {
validationError = true;
saxParseException = exception;
String message = "Fatal Error: " + getParseExceptionInfo(exception);

writeToErrorFile(exception+"fatal error occured at "+exception.getLineNumber()) ;
}

public void warning(SAXParseException exception) throws SAXException {
}


private String getParseExceptionInfo(SAXParseException spe) {
String systemId = spe.getSystemId();
if (systemId == null) {
systemId = "null";
}
String info = "URI=" + systemId +
" Line=" + spe.getLineNumber() +
": " + spe.getMessage();
return info;
}

<ProductDescription>PUé CAS~</ProductDescription> in my xml file i have a tag like this at line 20 of xml file and characters é cause the problem.



can any one suggest a feasible solution for this.



A table, a chair, a bowl of fruit and a violin; what else does a man need to be happy?:Einstein
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
1. please use Code tags for code presentation
2. Every time I have used code to get line and column numbers it has worked - how far off is the line number report? If it tags a line before the error you know about perhaps there is an earlier bad character.
3. If the source file has every been near Microsoft Word you may have "smart punctuation" which looks reasonable when you edit the file but is in fact invalid Unicode.

Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18991
    
    8

Abhi Venu wrote:
<ProductDescription>PUé CAS~</ProductDescription> in my xml file i have a tag like this at line 20 of xml file and characters é cause the problem.



can any one suggest a feasible solution for this.




Yes. Ask whoever produced the document to have it declare its encoding correctly. (And make sure that your code doesn't convert the bytes of the document to chars.) It appears from what you posted that the document was encoded in UTF-8, but whatever you used to display it is assuming some other encoding.

Trying to locate the "invalid" characters precisely is most likely a waste of time in this case. I think you just have an encoding problem.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SAX parsing of xml file Tracing invalid characters