File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes xml parser design question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "xml parser design question" Watch "xml parser design question" New topic
Author

xml parser design question

John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Hi,
I have a requirement where i have to parse a ~125000 lines containing XML (DB extracted file) into a another text file (precisely a .ntriples file). When I parse the xml, i have to take the node names, attribute names, attribute values, CDATA content and translate them to some meaningful URI's and write them onto the text file. Consider below sample
<Students>
<student name="john" age="22" subject="geography">John is good singer</student>
<student name="jai" age"22" subject="java">Jai is a good dancer</student>
</Students>

.... similarly many number of different nodes and attributes... Now i have to parse this and write into a text file like below -
<http://www.coderanch.com/student> <www.xmlschema#typeOf> <http://www.coderanch.com/Students>.
<http://www.coderanch.com/student> <www.xmlschema#name> "John".
<http://www.coderanch.com/student> <www.xmlschema#age> "22".
<http://www.coderanch.com/student> <www.xmlschema#subject> "geography".
<http://www.coderanch.com/student> <www.xmlschema#generaldescription> "John is good singer".

.... similarly this .ntriples file will contain all the information from the xml parsed like above.

My Questions ->
1. Which parser should I use - a DOM or SAX? I have written one or two of both and i think if there are 10000 nodes and if I iterate through a node list using DOM then it will take much time and difficult to code since many CDATA segments are also present in the xml. To add the application need not be super fast as it will be run only as a Batch.

2. How to do the comparison work swiftly. Say i hit a node <student> and now I should know that the corresponding URI of <student> node is <http://www.coderanch.com/student>. There can be around ~100 such URI mappings for nodes & attributes. What should I use - either load the node-URI mapping using Java.util.Properties or keep it as constants file.

3. What FileWriter should I use. The ntriples file need not be encoded.

Thanks,
John
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18985
    
    8

Since the task appears to be to transform an XML document into a text document, my first instinct would be to write an XSL transformation. I might think twice about that if the business logic turned out to be complex, but I do try to avoid writing transformations with low-level tools like SAX or DOM.

(And by the way using an XML parser is not the same as writing an XML parser.)
Wim Vanni
Ranch Hand

Joined: Apr 06, 2011
Posts: 96

I think there are many arguments in favor of one or the other (see for example here) but I do agree that an XSLT transformation seems a logical solution.

Cheers,
Wim
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Thanks Paul & Wanni for your replies
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: xml parser design question