This certainly looks like a job for a SAX or StAX parser in order to not worry about memory consumption.
XPath only works with DOM so that is right out.
You will need to figure out a way (probably a custom class) to hold all of the information inside a <c> element until you can decide which output file gets the content.
I would certainly work ENTIRELY in text - no need to get tangled up with building DOMs.
I suggest you review the relevant chapters in
Harold's free online book.
Bill