• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Removing Character References from Attributes

 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When using Xerces, is there an efficient way to remove character references from attribute values? Below is an example:

<narrative text="There are some sentences here and these characters we want out&#xD;&#xA;Some more stuff here.">

I'm looking to parse the XML document and replace the "&#xD;&#xA;" with spaces.


Thanks,

James
 
Paul Clapham
Sheriff
Pie
Posts: 20971
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems to me that Xerces should do that without being asked. That's Attribute-Value Normalization. Are you saying it doesn't do that?

Umm... reading that section of the XML recommendation again, it appears that character references (as opposed to characters) are immune from the normalization rules. In which case you would indeed have to replace them yourself. Something like this?
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul. That is what I found too. I was just hoping that I missed something or that there was some setting somewhere. We are working with very large multi-MB files and I am trying to avoid assigning String manipulation/comparision routines when processing attributes.

Here is a good example of poor XML design I think. Attributes should not have text (sentences) as values. In these cases, text should be element content, not an attribute value. Unfortunately, the XML design is out of our control.

I'm thinking that we might use UNIX Sed/Awk program to read through file and replace these in XML document before sending to Xerces parse routine. Not sure how big an issue this is right now.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic