aspose file tools*
The moose likes XML and Related Technologies and the fly likes Extracting a nested XML document Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extracting a nested XML document" Watch "Extracting a nested XML document" New topic
Author

Extracting a nested XML document

Will Myers
Ranch Hand

Joined: Aug 05, 2009
Posts: 328

Hi,
I have a bean that is consuming an xml document that contains another xml document and has all the < and > tags replaced by HTML tags - I'm told this is standard practice. Can anyone point me in the right direction as to extracting the inner xml document in the correct format? If I try to parse it using:



it works but I can't then access any of the values....

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
The '<' and '>' characters are tag delimiters. What do you mean by "replaced by HTML tags"? Can you provide an example?

Will Myers
Ranch Hand

Joined: Aug 05, 2009
Posts: 328

< becomes the html tag & lt; and > becomes & gt;

I would post an example but this forum converts them to < and > and I don't kbnow how to escape them
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
Thanks. & lt; and & gt; are HTML entities. They are not considered "tags".

Attempting to "nest" an XML document within another sounds like a bad design idea and conflicts with
the core premise of XML. The difficulty you are encountering is a result of poor design.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12803
    
    5
Bad design! you got that right! For years I have been dealing with a client who got stuck with this design.

A CDATA section is used to hide a complete XML document text - to work with it I have to extract the entire CDATA section to a String, build an org.xml.sax.InputSource from the String and parse that to a DOM.

Then of course all of the normal org.w3c.dom and related methods work to access values.

Bill
Will Myers
Ranch Hand

Joined: Aug 05, 2009
Posts: 328

Bad design but there's not much I can do about that. I have got round it by using XPath to extract the inner xml that I'm interested in then just replacing all the HTML entities with the xml ones then working on the result as normal, bit of a faff....
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Extracting a nested XML document