I'm trying to extract an element (and all its children) from a given XML file as
string. However, I want to retain its current infoset. Bassically, I want the parser to leave alone the XML I'm extracting.
So, to do this I build a DOM document from a given XML string, I search for the desired element, and next I output that element via a Transformer to a String.
The problem is that the orignal XML is not the same any more as the outputed XML
The source XML might be like this:
I extract the 'person' element and write it back to String, then it looks like this:
Now, I now that the input XML contains redundant namespacing, and that the output XML is better. But in this case I want to output XML exactly to be as the input XML:
- Whitespaces/tabs/linefeeds/cariaged/what ever must be retained
- No namespace optimalisation whatsoever
So, what I'm looking for a 'substring()' on the original XML. The problem is that real substring is not that simple on an XML and probably the least prefered/clean solution.
I tried the 'Transform' class from xalan, you can configure a lot there. I managed to configure it so it leaves the indenting alone. But it still does namespace optimalisation and also removes linefeeds between namespace declarations. If I would have
<someElement xmlns:ns1="test1"
xmlns:ns2="test2"</someElement>
Then the output looks like:
<someElement xmlns:ns1="test1" xmlns:ns2="test2"</someElement>
Any advise is welcome !