aspose file tools*
The moose likes XML and Related Technologies and the fly likes JAXB Parsing error: The entity name must immediately follow the '&' in the entity reference Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "JAXB Parsing error: The entity name must immediately follow the Watch "JAXB Parsing error: The entity name must immediately follow the New topic
Author

JAXB Parsing error: The entity name must immediately follow the '&' in the entity reference

sriraman seshadri
Greenhorn

Joined: May 25, 2010
Posts: 7
I have XML (of size around 2 MB) that I need to parse using JAXB but have no control over the creation of the XML. XML comes from a thrid party.
Unfortunately XML contains things like:
<name>James & Colin</name>
And when i fed the xml to the jaxb parser it gave me an error as
"The entity name must immediately follow the '&' in the entity reference"

Is there a work around for this in JAXB? I need to parse the <name> element as "James & Colin"

When i googled for solution i get solutions like change & to & . But i dont have control over the creation of xml.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Yes, that's correct. The solution is to correct the XML so that the ampersand in the text node is escaped properly.

There's no workaround for malformed XML, which is what you have there. If you don't have control over creating the XML, then send it back to whoever does have control and get them to fix it. In the HTML world it may be acceptable to generate malformed HTML, and browsers will attempt to parse it, but in the XML world it doesn't work that way. Malformed XML doesn't get parsed. That's a rule of XML.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
Create a Korn Shell or Perl script that will search file for the 'space' & 'space' and replace with 'space'&amp;'space'.

Then send the newly created file to the JAXB application.

The ampersand character is an XML special character, and to have the character in an instance, the corresponding XML entity must be used.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

And then they are going to have "A&P" in a text element, which that particular hack won't catch. Or they're going to have "<" unescaped in a text element. Or something else. There's really no future in trying to clean up other people's malformed XML.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
There are only five XML special characters. Writing a good "scrubber" to clean up the file to make it compliant is a solution, when better alternatives are not possible or cost-prohibitive, or there are political obstacles.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Jimmy Clark wrote:There are only five XML special characters. Writing a good "scrubber" to clean up the file to make it compliant is a solution...

Sure it is. I just don't think you should present it as such without mentioning that a good scrubber would be extremely difficult to write.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
Sure it is. I just don't think you should present it as such without mentioning that a good scrubber would be extremely difficult to write.


Thanks for sharing your opinion. I, on the other hand, would not mention that it would be extremely difficult to write because I do not know the OP's abilities or the
resources that he/she has access to. Your view that it would be "difficult" is subjective and based on your interpretation of the difficulty.

Secondly, I have written many such applications rather easily, many times, over and over again. So, from my perspective it would an easy task. But it would be wrong for me to describe it as such, again because subjective opinions are ill-placed in Internet forums involving mostly strangers.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Of course it's my subjective opinion. (There aren't any other kind of opinions as far as I know.) But it's not a worthless opinion either... you didn't mean to suggest that, did you? I also disagree with your opinion that it would be easy to write a scrubber for bad XML, but then we haven't agreed on what this scrubber should really do so there isn't really much to agree or disagree about.

And of course I didn't mean that the OP should attempt that task himself; it is possible to make some estimate of his abilities, for example he doesn't know a basic feature of XML, namely escaping, so he's a beginner in the XML world. Writing a scrubber shouldn't be something he's attempting just yet.

Of course if it's really not too hard to do such a thing then it should already exist and be posted on the Internet. I haven't looked for such a thing because my position is the same as that of the designers of XML, namely that it is the responsibility of the creator of an XML document to make sure it is well-formed.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: JAXB Parsing error: The entity name must immediately follow the '&' in the entity reference