aspose file tools*
The moose likes XML and Related Technologies and the fly likes Xerces Parser, issue with Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Xerces Parser, issue with "/" in data body during XMLEntityScanner.load" Watch "Xerces Parser, issue with "/" in data body during XMLEntityScanner.load" New topic
Author

Xerces Parser, issue with "/" in data body during XMLEntityScanner.load

Allain Walker
Greenhorn

Joined: May 24, 2005
Posts: 14
Hi,
To start with sorry about the length of the post, I can't get to a site for hosting a file due to my overlords securing of the Internets and I am not sneaky enough to find a way around the blocking.

Stack
JDK version build 1.6.0_30-b12
Xerces-J 2.6.2

I am working on a project that parses XML input and writes it to a POJO. In order to do this we use internal XERCES to parse the XML and a Handler to write to the POJO. We pretty much need the whole tree to be parsed and then work with the results. We don’t use a schema to get the results.

The issue that we are having expresses itself with a malformed body being returned to the POJO, not always and it was difficult for us to catch this defect. Eventually we have been able to isolate an XML string that will consistently produce the defect.

From inspecting the source it appears that the defect is occurring in XMLEntityScanner.load function, in particular line 1742 where it reads in further from the XML String. By calling function



There seems to be an issue with the read operation occurring when the boundary char is a "/" at which point the read operation looses it lunch-box and alters the xmlString offset and length members. If we add another 5 chars to any of the bodies in previous to line 40 in the data then the issue resolves.

To demonstrate this defect I have included the parser and demo StandAloneBynamicHandler.java.

So my question really is are we doing this correctly, is there an underlying issue that I don not fully understand?

As this is in code that has been out there for years I am very doubtful that we have stumbled upon a new defect but I have been unable to find anything with specific reference to the above class.

I also do not think it is related to http://www.coderanch.com/t/129602/XML/SAX-parser-character-call-back as this is a precursor to the call to characters(char ch[], int start, int length) call.

Thanks in advance
Allain



5 years on and still a Greenhorn, what to do, what to do?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18907
    
    8

But that XML string in your code doesn't contain any \ characters, so I don't quite understand what you're asking. The Xerces code which you referred to seems as if it should be trying to deal with some kind of entities, but I don't see any of those either.
Allain Walker
Greenhorn

Joined: May 24, 2005
Posts: 14
So busy trying to make sure that my post was correct I didn't even realise I put the "/" char round the wrong way. Dang nab it!

The Data that is going wrong is



specifically <valid_till_date>12/12/1212</valid_till_date> which is returned as ponse/1212 instead of 12/12/1212

This only happens when there are over 80 lines in the XML.

Thanks
Again
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18907
    
    8

I recall that in the past, various XML parsers have had bugs which looked rather like that. Mostly they were related to attributes, if I remember right, but they were all something to do with faulty buffering.

Yes, the rule is that you should blame your own code first. And you have done that, which was the right thing to do. It's possible to mess things up in the characters() method by taking data from outside the [start, end] range, but your code doesn't do that either, it does the right thing. So my suspicion is that you're dealing with a Xerces bug.

I see the latest Xerces version is 2.11, so your version is quite old. Upgrading might not be a bad idea.
Allain Walker
Greenhorn

Joined: May 24, 2005
Posts: 14
Thanks Paul,
I was kind of hoping that we had messed up someplace.
I will try the latest parser to see if it is still an issue.
Thanks for your help
Allain Walker
Greenhorn

Joined: May 24, 2005
Posts: 14
Updating to the latest version of Xerces works a treat.
Now I just have to figure how to get that out to the customer.
Thanks
Allain
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Xerces Parser, issue with "/" in data body during XMLEntityScanner.load