Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problem with SAX Parser

 
Srikanth Raghavan
Ranch Hand
Posts: 389
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am parsing an XML file which is not so huge. I am using SAX parser. It doesn't fetch the data properly for exactly one element in the file.

For eg:

<?xml...>
<parent>
<child>
<element1>data1</element1>
<element2>data2</element2>
<element3>data3</element3>
.
.
.
<elementn>datan</elementn>
</child>
</parent>



I am subclassing the DefaultHandler class and overriding the methods like:
characters(char[] ch, int start, int length)
endElement(String uri, String localName, String qName) and others...

While debugging though I found out that for one element it just fetches the data partially... i.e if <element2> has data data2 the characters() method just gets da and gets the remaining data (ta2) in the next iteration for the same element which is really weird.

This situation happens exactly at the same place for the same element. Are there any limitations? Or am I doing something wrong?

Incase the information is not clear or insufficient for you to help me, please let me know and I will give more information.

Thank you
 
Srikanth Raghavan
Ranch Hand
Posts: 389
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey,

I was able find out the real issue. Actually the characters() method will not return the complete data between the tags at one shot, I think it depends on some buffer size, I don't know the exact reason.

So, till we encounter the endElement() method, we have to store the data given by the characters() method in our own buffer for that current element.

Please tell me if there's a better option.

Thank you!
 
Tom Johnson
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No thats the only way. SAX specifies that it does not need to return full element content in one characters() call. You need to use a string buffer and append the data from multiple character() calls until you get the endElement() call, as you said.

/tom
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic