I'm new to XML so please pardon my open ended questions. I'm using the DOM approach to parse a large XML file (over 15MB) using the following code:
This was working fine for a while, though it did seem to be using a lot of RAM as I had to set the vm heap to 256MB for it to run as I was getting a java.lang.outofMemoryException. I'm now using a different vm and I'm getting this same error even on smaller files that used to work fine.
I'm trying to decide between 2 approaches to resolving this: 1) address the garbage collection to clean up immediately after each top level element has been stored in the database, or 2) switch from the DOM approach to another approach that doesn't parse the entire file first. I'm leaning towards this approach (though I've never used it before and could use some pointers on where to learn about it) as my XML file could get much larger than 15MB.
Which approach would you recommend? If # 2, do you have some suggestions where I could learn this approach quickly (searching on XML yields n! hits...).
Hi, If the size of the xml file that you parse is very huge, the best option to parse that xml is neither through DOM nor through SAX. The best option is available through SAX Extensions which uses SAX as well as filters.
You can read Brett McLaughlin's xml tip at IBM developerworkshere
Hope it helps..
SCJP 1.4, SCDJWS , SCJA<br />I can do ALL things through CHRIST who strengthens me.
Memory used in creating a DOM is much larger than the source document - all the elements get turned into Java objects and of course the text is char Unicode. Frequent GC is not going to help. SAX style processing is the only feasible way to go. Bill