I have written some crude code which does this, but it does not use the schema and when I validate the new XML against the schema, I find that some of my new nodes have the correct parent but are out of sequence.
I would like to use the schema to guide the merge; to determine where nodes should be inserted.
The code I am writing to do this is very cumbersome and fragile. Is there an established way of achieving this? XSLT?
I have considered using JAXB to generate a model and then perform the merge on the model (using reflection to set properties) and then marshalling to the XML. But this doesn't seem very nice either.
If you mean looking at two schemas and able to come up with a definite plan to do the merge of one document into another, write up an xslt document as a template reflecting that plan, can surely do the job quite effectively. If you mean let the machine parses the two schemas and let it come up with a definite plan for the merger, that alone would be a formidable task.
In fact, both XML files share 1 schema, so perhaps this makes things a little easier.
But what I am doing (which seems very bad) is
1) iterating through each node in XML doc1
2) look up the required location of the node in XML doc2 using XPath and the schema
3) find the location in XML doc2 and insert node
And there seem to be a lot of loops and tests and it is very messy.
So do you think XSLT is the best tool to use here? I want to guarentee that the merged document will always validate against the schema.
>So do you think XSLT is the best tool to use here? I always distrust the claim of superlative. But I think it is quite a effective tool for doing it. The validity of the final document can only be guranteed by the logic built into the xsl document. That part of it thereby depends on the perspicacity of the author of the xsl document. Other than that, there is no gurantee. The reason is that if that is an overwhelmingly complicated task to code.
I can cook up a demo and you'll see it is not that trivial as one might think in the detail.
[1] Suppose the common schema look like this.
[2] The two xml documents look like this, for instance.
Watch carefully the possible missing tags.
[3] The xsl document can look like this using the lowest common denominator of xslt 1.0.
[3.1] I make more provisions in the xsl than is necessary that is why it looks more than minimum necessity. The elements a, b and c can be complicated complexType and it will perform the same. There can be other elements other than a, b and c inside the root in the container.xml which will be preserved (that's why there is an identity transformation at the start of it....) In any case, it shows the already not very naive sequencing of a, b and c because any of them can be absent. I leave a couple xsl:if blocks there repetitively so as to highlight the implementation of the sequence minOccurs=0 and maxOccurs="unbounded". You can try to put the logic into a couple of named templates.
[3.2] Imagine more complicated situation and asking xsl document to make sure the resultant output be validated as well, it is a very complicated task.
[3.3] In the xsl, although already fairly elaborated, it assumes the containing.xml contains at least one a or b or c element. It can be further elaborate to accommodate the case where there is none of them. I leave it to you as an exercise.
[4] Late Edit note: Upon re-reading what I posted, I found a loop-hole in certain xsl:if where it requires double counting condition. I re-edit that part. This is to record that edition to avoid any confusion.