aspose file tools*
The moose likes XML and Related Technologies and the fly likes Parsing an XML that contains the '&' character Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Parsing an XML that contains the Watch "Parsing an XML that contains the New topic
Author

Parsing an XML that contains the '&' character

Eyal Golan
Greenhorn

Joined: May 20, 2008
Posts: 21
Hello all,
I've just started working on a new project and I encountered an XML parsing problem.
A (very) short description of the project's design:
There is a servlet that accepts XML and passes it to a handler to process.
Everything is in UTF-8 encoding.

The problem is in that handler:
Suppose I have something like the following:

the characters method of the DefaultHandler seemed to split that value and actually was called 3 times.

The original code (which I started working on) had a string that was set in characters method:

It was initialized to empty string at the beginning of startElement method.
Then, in endElement, tempVal was used to build the domain objects.

I created a small JUnit test and found a solution.
The solution is to concatenate that tempVal in characters method.

I would like to consult you if this is the correct one, or is there a better one.

(Forgive me for the long post, as I wanted to be as clear as possible).
Here's the code (i could not attach a java / text file)



well, my question can be also:
Is there a way to set the way the parser works so it won't split the '&' ?


Thank you very much for any help
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

One option would be only to allow valid XML, which won't have < and & in it.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
The ampersand character is a "special character" in XML-based markup languages. In order to include the ampersand character in the instance, you must use the XML entity instead of the character itself. The entity is &amp;
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

If this is the FAQ where a SAX parser splits a text node into several parts and calls the characters() method once for each of them, then yes, everything you said was correct. And your solution was correct too. And no, you can't configure the parser to not do that. After all, the documentation does say it might do it and it doesn't cause any problems for applications that take that possibility into account.

However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.
Eyal Golan
Greenhorn

Joined: May 20, 2008
Posts: 21
Thank you all for the answers.
Paul Clapham wrote:...However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.

Thanks for reminding me the StringBuilder. I used the String tempVal as this is what was before...

And again, thank you all
 
 
subject: Parsing an XML that contains the '&' character