This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Xerces Parser, issue with "/" in data body during XMLEntityScanner.load

 
Allain Walker
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
To start with sorry about the length of the post, I can't get to a site for hosting a file due to my overlords securing of the Internets and I am not sneaky enough to find a way around the blocking.

Stack
JDK version build 1.6.0_30-b12
Xerces-J 2.6.2

I am working on a project that parses XML input and writes it to a POJO. In order to do this we use internal XERCES to parse the XML and a Handler to write to the POJO. We pretty much need the whole tree to be parsed and then work with the results. We don’t use a schema to get the results.

The issue that we are having expresses itself with a malformed body being returned to the POJO, not always and it was difficult for us to catch this defect. Eventually we have been able to isolate an XML string that will consistently produce the defect.

From inspecting the source it appears that the defect is occurring in XMLEntityScanner.load function, in particular line 1742 where it reads in further from the XML String. By calling function



There seems to be an issue with the read operation occurring when the boundary char is a "/" at which point the read operation looses it lunch-box and alters the xmlString offset and length members. If we add another 5 chars to any of the bodies in previous to line 40 in the data then the issue resolves.

To demonstrate this defect I have included the parser and demo StandAloneBynamicHandler.java.

So my question really is are we doing this correctly, is there an underlying issue that I don not fully understand?

As this is in code that has been out there for years I am very doubtful that we have stumbled upon a new defect but I have been unable to find anything with specific reference to the above class.

I also do not think it is related to http://www.coderanch.com/t/129602/XML/SAX-parser-character-call-back as this is a precursor to the call to characters(char ch[], int start, int length) call.

Thanks in advance
Allain


 
Paul Clapham
Sheriff
Pie
Posts: 20771
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But that XML string in your code doesn't contain any \ characters, so I don't quite understand what you're asking. The Xerces code which you referred to seems as if it should be trying to deal with some kind of entities, but I don't see any of those either.
 
Allain Walker
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So busy trying to make sure that my post was correct I didn't even realise I put the "/" char round the wrong way. Dang nab it!

The Data that is going wrong is



specifically <valid_till_date>12/12/1212</valid_till_date> which is returned as ponse/1212 instead of 12/12/1212

This only happens when there are over 80 lines in the XML.

Thanks
Again
 
Paul Clapham
Sheriff
Pie
Posts: 20771
30
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I recall that in the past, various XML parsers have had bugs which looked rather like that. Mostly they were related to attributes, if I remember right, but they were all something to do with faulty buffering.

Yes, the rule is that you should blame your own code first. And you have done that, which was the right thing to do. It's possible to mess things up in the characters() method by taking data from outside the [start, end] range, but your code doesn't do that either, it does the right thing. So my suspicion is that you're dealing with a Xerces bug.

I see the latest Xerces version is 2.11, so your version is quite old. Upgrading might not be a bad idea.
 
Allain Walker
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul,
I was kind of hoping that we had messed up someplace.
I will try the latest parser to see if it is still an issue.
Thanks for your help
 
Allain Walker
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Updating to the latest version of Xerces works a treat.
Now I just have to figure how to get that out to the customer.
Thanks
Allain
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic