Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
The moose likes XML and Related Technologies and the fly likes Parsing large file using DOM Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Parsing large file using DOM" Watch "Parsing large file using DOM" New topic

Parsing large file using DOM

Mike Southgate
Ranch Hand

Joined: Jul 18, 2003
Posts: 183
I'm new to XML so please pardon my open ended questions. I'm using the DOM approach to parse a large XML file (over 15MB) using the following code:

This was working fine for a while, though it did seem to be using a lot of RAM as I had to set the vm heap to 256MB for it to run as I was getting a java.lang.outofMemoryException. I'm now using a different vm and I'm getting this same error even on smaller files that used to work fine.

I'm trying to decide between 2 approaches to resolving this:
1) address the garbage collection to clean up immediately after each top level element has been stored in the database, or
2) switch from the DOM approach to another approach that doesn't parse the entire file first. I'm leaning towards this approach (though I've never used it before and could use some pointers on where to learn about it) as my XML file could get much larger than 15MB.

Which approach would you recommend? If # 2, do you have some suggestions where I could learn this approach quickly (searching on XML yields n! hits...).


ms<br />SCJP, SCJD
Arun Prasath
Ranch Hand

Joined: Sep 17, 2003
Posts: 192
If the size of the xml file that you parse is very huge, the best option to parse that xml is neither through DOM nor through SAX.
The best option is available through SAX Extensions which uses SAX as well as filters.

You can read Brett McLaughlin's xml tip at IBM developerworkshere

Hope it helps..

SCJP 1.4, SCDJWS , SCJA<br />I can do ALL things through CHRIST who strengthens me.
William Brogden
Author and all-around good cowpoke

Joined: Mar 22, 2000
Posts: 13036
Memory used in creating a DOM is much larger than the source document - all the elements get turned into Java objects and of course the text is char Unicode. Frequent GC is not going to help.
SAX style processing is the only feasible way to go.
I agree. Here's the link:
subject: Parsing large file using DOM
It's not a secret anymore!