• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

One big XML file or lots of small ones

 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My program uses a lot of static information to do calculations. To make this information more readable to humans I'm migrating from CSV to XML. The CSV files are all separate files for one kind of information.

I've already converted one CSV file to XML and written a test class to retrieve the necessary information. My test class uses DOM and XPath to look up the correct value it needs to do the calculation. I used DOM because I need very frequent access to these values and thought in memory access would speed things up. Is DOM the right way to go or should I use SAX?

Now I also want to convert my other CSV files and put them in the same XML. This would make it more manageable because I would have one file instead of 15. I reckon the total size of this one XML file will be around 2.5 MB (it's all static information so this size won't change). Can I still use DOM in this case? Will XPath be a lot slower when I'm looking through a big DOM tree? Is it better to create 15 seperate XML files and load them as 15 different DOM trees? Will SAX be faster?
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
XPath is kind of slow for repeated lookups. What I do with a DOM of test questions is to create a HashMap where the significant Element references are stored with the question ID as the key. All of the HashMaps are created when the DOM is initially parsed.

This leaves the DOM intact but speeds up the access.

Bill
 
Sheriff
Posts: 28333
97
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note also that the choice between DOM and SAX is a choice about how you will get the data from external storage into memory, not how you will store the data in memory.

If you don't need the Document that a DOM parser produces, and you just want to produce your own data structure in memory, then you could use a SAX parser. But if you already have something working, I wouldn't rewrite your code on the grounds that your program might start up faster. At this point I would say you have better things to do.
 
Kim Mirtens
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe my exact work method will be more clear with my source code. I'm really a newbie when it comes to xml. I'm really uncertain about this code. This is just a test class to access my xml document.

My xml document looks like this:

(of course with a lot more data elements)

My java class to access the data looks like this:


Is this a good way to handle things? Will this give a good performance when I repetitively look up things? What is your advice/suggestions?
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I worry when I see code like this:


because it does not provide for the case when getFirstChild() returns null - which is what you would get if the Element is empty. Also if the NodeList is empty you will get a NullPointerException.

How is the performance of your existing code? Have you tested it with a realistic load?

Bill
 
Kim Mirtens
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by William Brogden:

How is the performance of your existing code? Have you tested it with a realistic load?

Bill



I only wrote this test class. To actually implement it and test it under a realistic load I still need to do some major recoding. I really can't judge performance.
Being a test class it is indeed not error safe. I was more interested in what you thought of my implementation of XML.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Having consider what you appear to be trying to do, I conclude that working with the DOM is a bad idea. If this was my problem I would use SAX to parse, and create a POJO (Plain Ole Java Object) for each <data> element.

With a collection of POJO in hand, you are free to make all sorts of lists and Maps of references to the POJO sorted and indexed in various ways that will facillitate the lookup and interpolation. MUCH faster than XPath and other DOM operations and less memory intensive.

Make your POJO serializable and you can write out the whole collection as a serialized list object, but I bet you will find the SAX parsing is pretty fast.

Bill
 
If you were a tree, what sort of tree would you be? This tiny ad is a poop beast.
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic