aspose file tools*
The moose likes XML and Related Technologies and the fly likes One big XML file or lots of small ones Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "One big XML file or lots of small ones" Watch "One big XML file or lots of small ones" New topic
Author

One big XML file or lots of small ones

Kim Mirtens
Greenhorn

Joined: Dec 17, 2006
Posts: 21
My program uses a lot of static information to do calculations. To make this information more readable to humans I'm migrating from CSV to XML. The CSV files are all separate files for one kind of information.

I've already converted one CSV file to XML and written a test class to retrieve the necessary information. My test class uses DOM and XPath to look up the correct value it needs to do the calculation. I used DOM because I need very frequent access to these values and thought in memory access would speed things up. Is DOM the right way to go or should I use SAX?

Now I also want to convert my other CSV files and put them in the same XML. This would make it more manageable because I would have one file instead of 15. I reckon the total size of this one XML file will be around 2.5 MB (it's all static information so this size won't change). Can I still use DOM in this case? Will XPath be a lot slower when I'm looking through a big DOM tree? Is it better to create 15 seperate XML files and load them as 15 different DOM trees? Will SAX be faster?
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12821
    
    5
XPath is kind of slow for repeated lookups. What I do with a DOM of test questions is to create a HashMap where the significant Element references are stored with the question ID as the key. All of the HashMaps are created when the DOM is initially parsed.

This leaves the DOM intact but speeds up the access.

Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18882
    
    8

Note also that the choice between DOM and SAX is a choice about how you will get the data from external storage into memory, not how you will store the data in memory.

If you don't need the Document that a DOM parser produces, and you just want to produce your own data structure in memory, then you could use a SAX parser. But if you already have something working, I wouldn't rewrite your code on the grounds that your program might start up faster. At this point I would say you have better things to do.
Kim Mirtens
Greenhorn

Joined: Dec 17, 2006
Posts: 21
Maybe my exact work method will be more clear with my source code. I'm really a newbie when it comes to xml. I'm really uncertain about this code. This is just a test class to access my xml document.

My xml document looks like this:

(of course with a lot more data elements)

My java class to access the data looks like this:


Is this a good way to handle things? Will this give a good performance when I repetitively look up things? What is your advice/suggestions?
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12821
    
    5
I worry when I see code like this:


because it does not provide for the case when getFirstChild() returns null - which is what you would get if the Element is empty. Also if the NodeList is empty you will get a NullPointerException.

How is the performance of your existing code? Have you tested it with a realistic load?

Bill
Kim Mirtens
Greenhorn

Joined: Dec 17, 2006
Posts: 21
Originally posted by William Brogden:

How is the performance of your existing code? Have you tested it with a realistic load?

Bill


I only wrote this test class. To actually implement it and test it under a realistic load I still need to do some major recoding. I really can't judge performance.
Being a test class it is indeed not error safe. I was more interested in what you thought of my implementation of XML.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12821
    
    5
Having consider what you appear to be trying to do, I conclude that working with the DOM is a bad idea. If this was my problem I would use SAX to parse, and create a POJO (Plain Ole Java Object) for each <data> element.

With a collection of POJO in hand, you are free to make all sorts of lists and Maps of references to the POJO sorted and indexed in various ways that will facillitate the lookup and interpolation. MUCH faster than XPath and other DOM operations and less memory intensive.

Make your POJO serializable and you can write out the whole collection as a serialized list object, but I bet you will find the SAX parsing is pretty fast.

Bill
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: One big XML file or lots of small ones