This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

sax parsing problem

 
mickey scott
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello to everybody...I have a problem when I try to parse a big xml file with SAX ( org.apache.xerces.parsers.SAXParser ).I have made a class that extends DefaultHandler and re-implemented the methods that read from the file.In the method characters(char[] text, int start, int length) appens a strange thing....the content between the start and the end tag is not complete.....
For examples
<vsData>vsDataAal5TpVccTp</vsData>
returns in the method characters the String "taAal5TpVccTp....." ,at the same line but also if in the file it occurs more than once.
How is this possible?Can anyone help me?
Thanks to everybody..
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, welcome to the ranch! You will no doubt receive an invitation to change your name to one that meets the ranch standards, but it's a friendly thing.

There is no guarantee that the SAX parser will give you all the characters in a tag at once. Most implementations read a chunk of the input stream and parse it, then read another chunk and parse it. When a chunk border comes in the middle of a string of chars you will get two calls to chars(). It would be legal for the parser to call chars() with one character at a time!

On start tag clear out a buffer, on chars() append to the buffer and on end tag you have the whole value. This can be tricky if you have chars divided around a nested tag, like an HTML paragraph with a few bold words in the middle, but for most pure data applications it works nicely.

See if that helps!
 
mickey scott
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for help...I have buffered the tag content in the characther method and I write it in the endElemet method...
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic