Hi everybody,
How does the character method of ContentHandler work ?
I am parsing a simple xml document using the SAX parser :
<?xml version='1.0' encoding='utf-8'?>
<bookstore>
<book>
<title> THE GREATNESS GUIDE </title>
<author> Robin Sharma</author>
</book>
<book>
<title>ONE NIGHT AT THE CALL CENTER </title>
<author>Shyam Bhagwat </author>
</book>
</bookstore>
public void startElement(String uri ,String localname ,String qName ,Attributes attr) throws SAXException {
System.out.println("<" + qName + ">") ;
}
public void endElement(String uri ,String localName ,String qName) throws SAXException {
System.out.println("</" + qName + ">") ;
}
public void characters(char [] ch ,int start ,int length) throws SAXException {
System.out.println(" *" + new String(ch) ) ;//new String(ch,start,length) ) ; line ***** }
}
The output for this is :
---------- interpreter ----------
<bookstore>
*
<bookstore>
<book>
<title> THE GREATNESS GUIDE </title>
<author> Robin Sharma</author>
</book>
<book>
<title>ONE NIGHT AT THE CALL CENTER </title>
<author>Shyam Bhagwat </author>
</book>
</bookstore>
Output completed (0 sec consumed)
Now if I simply replace new String(ch) in line ***** with new String(ch,start,length) I get a proper output :
---------- interpreter ----------
<bookstore>
*
<book>
*
<title>
* THE GREATNESS GUIDE
</title>
*
<author>
* Robin Sharma
</author>
*
</book>
*
<book>
*
<title>
*ONE NIGHT AT THE CALL CENTER
</title>
*
<author>
*Shyam Bhagwat
</author>
*
</book>
*
</bookstore>
Output completed (0 sec consumed) - Normal Termination
I'm assuming you're asking about why you're getting those empty lines. The answer is that white space is significant in XML, and line breaks (which you have in your XML file) constitute white space.
Hi Ulf,
Actually the difference between the 2 outputs is that in the first output once the <bookstore> element is encountered the startElement method is invoked,after this the character method is called which prints the entire xml document and the execution stops without calling any further methods.
Whereas in the 2nd output all the methods(startElement,endElement,character) are called in a sequence.
This happened only because in the first version the character method looked like:
public void characters(char [] ch ,int start ,int length) throws SAXException {
System.out.println( new String(ch) ) ;
}
and in the 2nd version it looks like :
public void characters(char [] ch ,int start ,int length) throws SAXException {
System.out.println( new String(ch,start,length) ) ;
}
How is ch initialized ...and what is the significance of start and length?