I've just started working on a new project and I encountered an XML parsing problem.
A (very) short description of the project's design:
There is a servlet that accepts XML and passes it to a handler to process.
Everything is in UTF-8 encoding.
The problem is in that handler:
Suppose I have something like the following:
the characters method of the DefaultHandler seemed to split that value and actually was called 3 times.
The original code (which I started working on) had a string that was set in characters method:
It was initialized to empty string at the beginning of startElement method.
Then, in endElement, tempVal was used to build the domain objects.
I created a small JUnittest and found a solution.
The solution is to concatenate that tempVal in characters method.
I would like to consult you if this is the correct one, or is there a better one.
(Forgive me for the long post, as I wanted to be as clear as possible).
Here's the code (i could not attach a java / text file)
well, my question can be also:
Is there a way to set the way the parser works so it won't split the '&' ?
The ampersand character is a "special character" in XML-based markup languages. In order to include the ampersand character in the instance, you must use the XML entity instead of the character itself. The entity is &
If this is the FAQ where a SAX parser splits a text node into several parts and calls the characters() method once for each of them, then yes, everything you said was correct. And your solution was correct too. And no, you can't configure the parser to not do that. After all, the documentation does say it might do it and it doesn't cause any problems for applications that take that possibility into account.
However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.
Joined: May 20, 2008
Thank you all for the answers.
Paul Clapham wrote:...However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.
Thanks for reminding me the StringBuilder. I used the String tempVal as this is what was before...