I have a requirement where i have to parse a ~125000 lines containing XML (DB extracted file) into a another text file (precisely a .ntriples file). When I parse the xml, i have to take the node names, attribute names, attribute values, CDATA content and translate them to some meaningful URI's and write them onto the text file. Consider below sample
<student name="john" age="22" subject="geography">John is good singer</student>
<student name="jai" age"22" subject="java">Jai is a good dancer</student>
.... similarly many number of different nodes and attributes... Now i have to parse this and write into a text file like below -
<http://www.coderanch.com/student> <www.xmlschema#typeOf> <http://www.coderanch.com/Students>.
<http://www.coderanch.com/student> <www.xmlschema#name> "John".
<http://www.coderanch.com/student> <www.xmlschema#age> "22".
<http://www.coderanch.com/student> <www.xmlschema#subject> "geography".
<http://www.coderanch.com/student> <www.xmlschema#generaldescription> "John is good singer".
.... similarly this .ntriples file will contain all the information from the xml parsed like above.
My Questions ->
1. Which parser should I use - a DOM or SAX? I have written one or two of both and i think if there are 10000 nodes and if I iterate through a node list using DOM then it will take much time and difficult to code since many CDATA segments are also present in the xml. To add the application need not be super fast as it will be run only as a Batch.
2. How to do the comparison work swiftly. Say i hit a node <student> and now I should know that the corresponding URI of <student> node is <http://www.coderanch.com/student>. There can be around ~100 such URI mappings for nodes & attributes. What should I use - either load the node-URI mapping using Java.util.Properties or keep it as constants file.
3. What FileWriter should I use. The ntriples file need not be encoded.
Since the task appears to be to transform an XML document into a text document, my first instinct would be to write an XSL transformation. I might think twice about that if the business logic turned out to be complex, but I do try to avoid writing transformations with low-level tools like SAX or DOM.
(And by the way using an XML parser is not the same as writing an XML parser.)