I have a bean that is consuming an xml document that contains another xml document and has all the < and > tags replaced by HTML tags - I'm told this is standard practice. Can anyone point me in the right direction as to extracting the inner xml document in the correct format? If I try to parse it using:
it works but I can't then access any of the values....
Bad design! you got that right! For years I have been dealing with a client who got stuck with this design.
A CDATA section is used to hide a complete XML document text - to work with it I have to extract the entire CDATA section to a String, build an org.xml.sax.InputSource from the String and parse that to a DOM.
Then of course all of the normal org.w3c.dom and related methods work to access values.
Bad design but there's not much I can do about that. I have got round it by using XPath to extract the inner xml that I'm interested in then just replacing all the HTML entities with the xml ones then working on the result as normal, bit of a faff....