File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes I/O and Streams and the fly likes sax parsing problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "sax parsing problem" Watch "sax parsing problem" New topic

sax parsing problem

mickey scott

Joined: Mar 22, 2005
Posts: 5
Hello to everybody...I have a problem when I try to parse a big xml file with SAX ( org.apache.xerces.parsers.SAXParser ).I have made a class that extends DefaultHandler and re-implemented the methods that read from the file.In the method characters(char[] text, int start, int length) appens a strange thing....the content between the start and the end tag is not complete.....
For examples
returns in the method characters the String "taAal5TpVccTp....." ,at the same line but also if in the file it occurs more than once.
How is this possible?Can anyone help me?
Thanks to everybody..
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Hi, welcome to the ranch! You will no doubt receive an invitation to change your name to one that meets the ranch standards, but it's a friendly thing.

There is no guarantee that the SAX parser will give you all the characters in a tag at once. Most implementations read a chunk of the input stream and parse it, then read another chunk and parse it. When a chunk border comes in the middle of a string of chars you will get two calls to chars(). It would be legal for the parser to call chars() with one character at a time!

On start tag clear out a buffer, on chars() append to the buffer and on end tag you have the whole value. This can be tricky if you have chars divided around a nested tag, like an HTML paragraph with a few bold words in the middle, but for most pure data applications it works nicely.

See if that helps!

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
mickey scott

Joined: Mar 22, 2005
Posts: 5
Thanks for help...I have buffered the tag content in the characther method and I write it in the endElemet method...
I agree. Here's the link:
subject: sax parsing problem
It's not a secret anymore!