Hello to everybody...I have a problem when I try to parse a big xml file with SAX ( org.apache.xerces.parsers.SAXParser ).I have made a class that extends DefaultHandler and re-implemented the methods that read from the file.In the method characters(char text, int start, int length) appens a strange thing....the content between the start and the end tag is not complete..... For examples <vsData>vsDataAal5TpVccTp</vsData> returns in the method characters the String "taAal5TpVccTp....." ,at the same line but also if in the file it occurs more than once. How is this possible?Can anyone help me? Thanks to everybody..
Hi, welcome to the ranch! You will no doubt receive an invitation to change your name to one that meets the ranch standards, but it's a friendly thing.
There is no guarantee that the SAX parser will give you all the characters in a tag at once. Most implementations read a chunk of the input stream and parse it, then read another chunk and parse it. When a chunk border comes in the middle of a string of chars you will get two calls to chars(). It would be legal for the parser to call chars() with one character at a time!
On start tag clear out a buffer, on chars() append to the buffer and on end tag you have the whole value. This can be tricky if you have chars divided around a nested tag, like an HTML paragraph with a few bold words in the middle, but for most pure data applications it works nicely.
See if that helps!
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: Mar 22, 2005
Thanks for help...I have buffered the tag content in the characther method and I write it in the endElemet method...