Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parsing an XML that contains the '&' character

 
Eyal Golan
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,
I've just started working on a new project and I encountered an XML parsing problem.
A (very) short description of the project's design:
There is a servlet that accepts XML and passes it to a handler to process.
Everything is in UTF-8 encoding.

The problem is in that handler:
Suppose I have something like the following:

the characters method of the DefaultHandler seemed to split that value and actually was called 3 times.

The original code (which I started working on) had a string that was set in characters method:

It was initialized to empty string at the beginning of startElement method.
Then, in endElement, tempVal was used to build the domain objects.

I created a small JUnit test and found a solution.
The solution is to concatenate that tempVal in characters method.

I would like to consult you if this is the correct one, or is there a better one.

(Forgive me for the long post, as I wanted to be as clear as possible).
Here's the code (i could not attach a java / text file)



well, my question can be also:
Is there a way to set the way the parser works so it won't split the '&' ?


Thank you very much for any help
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One option would be only to allow valid XML, which won't have < and & in it.
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The ampersand character is a "special character" in XML-based markup languages. In order to include the ampersand character in the instance, you must use the XML entity instead of the character itself. The entity is &amp;
 
Paul Clapham
Sheriff
Posts: 21107
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If this is the FAQ where a SAX parser splits a text node into several parts and calls the characters() method once for each of them, then yes, everything you said was correct. And your solution was correct too. And no, you can't configure the parser to not do that. After all, the documentation does say it might do it and it doesn't cause any problems for applications that take that possibility into account.

However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.
 
Eyal Golan
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you all for the answers.
Paul Clapham wrote:...However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.

Thanks for reminding me the StringBuilder. I used the String tempVal as this is what was before...

And again, thank you all
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic