This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
it's like several XMLs concatenated into one XML. What's the best way to parse this document. Right now the document producer is still under construction, thus the concatenated XML. It's guaranteed that this is the only problem left (all aspects of the XML is valid). So in the mean time I need to know the practical way to keep parsing the XML. thanks
You call it a "document", but as I'm sure you already found out, it isn't an XML document because a well-formed XML document has a single root element. And what you posted doesn't have a single root element. And as you already found out, XML parsers can't deal with it because it isn't a well-formed XML document.
You could conceivably deal with it by wrapping the whole thing into a root element, maybe by just putting <root> at the beginning and </root> at the end. But then it still has those <?xml ... ?> things in the middle, which look like processing instructions but have a name ("xml") which processing instructions aren't allowed to have, so the parsers will complain about that too.
But you said
Right now the document producer is still under construction
So my suggestion would be to wait until the document producer is working right. Trying to write code which parses ill-formed XML is a waste of time if you know that eventually you won't need it.
Joined: Jan 31, 2011
let's say the document producer won't be fixed until next month while XML is used in day to day operation (not crucial, it's a convenience for the users, and you know how users are when they're not "convenient"). Is it possible to parse the documents to become multiple person object? I 'd rather not use the "additional root" technique. and if you ask why I don't fix it myself? the other team is responsible for maintaining it and they're assigned to other job right now, so the big boss tell me to somehow find a workaround. I'm thinking of something like this: