This week's book giveaway is in the Mac OS forum.
We're giving away four copies of a choice of "Take Control of Upgrading to Yosemite" or "Take Control of Automating Your Mac" and have Joe Kissell on-line!
See this thread for details.
The moose likes Performance and the fly likes how about process large XML file(bigger than 1GB) in Java? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Performance
Bookmark "how about process large XML file(bigger than 1GB) in Java?" Watch "how about process large XML file(bigger than 1GB) in Java?" New topic
Author

how about process large XML file(bigger than 1GB) in Java?

shuyi zhou
Greenhorn

Joined: Dec 25, 2007
Posts: 2
Dear All,

I have to process some large XML files (bigger than 1GB per file) in Java code, which approachs will pls suggest me to use?

SAX?
StAX?

Or any other better way?
Walter Bernstein
Ranch Hand

Joined: Dec 19, 2007
Posts: 57
Stax is a bit more developer friendly.

But before using sax/stax try dom4J with xpp parser. Maybe it can handle your data, but that depends on what you do with the file...
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12792
    
    5
I doubt very much that any DOM oriented parser will be able to handle a file bigger than 1GB and there may not be any reason to handle the entire thing in memory at one time. You left out the essential information - what has to be done to the data in this XML file??

If it is just record by record processing then event oriented (SAX or StaX) parsing will be the way to go. For record by record processing, an existing "pipeline" toolkit may be applicable.

So - more detail on what has to be done to the data please.

Bill
Raghavan Muthu
Ranch Hand

Joined: Apr 20, 2006
Posts: 3344

That's very true! It surely depends on what you are intended to do with the XML data after being processed.


Everything has got its own deadline including one's EGO!
[CodeBarn] [Java Concepts-easily] [Corey's articles] [SCJP-SUN] [Servlet Examples] [Java Beginners FAQ] [Sun-Java Tutorials] [Java Coding Guidelines]
Walter Bernstein
Ranch Hand

Joined: Dec 19, 2007
Posts: 57
Originally posted by William Brogden:
I doubt very much that any DOM oriented parser will be able to handle a file bigger than 1GB and there may not be any reason to handle the entire thing in memory at one time.

It worked for me with 1.2GB file, just check it. dom4j is DOM oriented, but not a real DOM parser.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by William Brogden:
I doubt very much that any DOM oriented parser will be able to handle a file bigger than 1GB


That would very much depend on the parser and the file structure, wouldn't it?

If, for example, the file contained one and the same tag again and again, a DOM oriented parser that interned the tag names would likely have no memory problem at all.

Let alone the trivial case of an XML file containing 99% white space...


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12792
    
    5
It worked for me with 1.2GB file, just check it. dom4j is DOM oriented, but not a real DOM parser.


That really surprised me - how much memory did you have to give the JVM?

Bill
Walter Bernstein
Ranch Hand

Joined: Dec 19, 2007
Posts: 57
Originally posted by William Brogden:


That really surprised me - how much memory did you have to give the JVM?

Bill


750MB
Raees Uzhunnan
Ranch Hand

Joined: Aug 15, 2002
Posts: 126
William Brogden is right DOM takes a lot of memory. Since trying to allocate these much of memory using small objects like nodes and elements also has an impact on performance and garbage collection !.

STAX works for us since it is on demand parsing and I need to only worry about exact the data I want to see.. check it out

Thanks
Raees


Sun Certified Enterprise Architect
Java Technology Blog
Lolke Dijkstra
Greenhorn

Joined: Nov 20, 2012
Posts: 2
Hi,

Have a look here: http://java.dzone.com/articles/conveniently-processing-large

You may also want to have a look at LDX+ framework for processing Big Data XML in Java. It also utilizes SAX, but uses code generation to generate the JavaBeans access to the schema complexTypes. It deals with large datasets by allowing the application programmer to configure what parts to process at runtime. It also deals with memory issues like containers.

We've got an evaluation version available for anyone who is interested in checking it out: http://xml2java.net/downloads.html. General information can be found at: http://xml2java.net

Cheers,
Lolke
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7893
    
  21

Lolke Dijkstra wrote:Have a look here: http://java.dzone.com/articles/conveniently-processing-large

You do realise that you answered a thread that is 5 years old? I suspect shuyi has left the building...

However, for my two-penn'orth, I think the first question I'd be asking myself is: how did my app get into such a state that I'm having to deal with 1Gb XML files in the first place?

Winston

PS: Nice surname BTW. I bet that gets you a few interviews.


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Lolke Dijkstra
Greenhorn

Joined: Nov 20, 2012
Posts: 2
Winston Gutkowski wrote:
Lolke Dijkstra wrote:Have a look here: http://java.dzone.com/articles/conveniently-processing-large

You do realise that you answered a thread that is 5 years old? I suspect shuyi has left the building...

However, for my two-penn'orth, I think the first question I'd be asking myself is: how did my app get into such a state that I'm having to deal with 1Gb XML files in the first place?

Winston

PS: Nice surname BTW. I bet that gets you a few interviews.


Haha! Well, nice firstname ;-)
You're probably right. That does not take away the fact that I have been involved in a project (banking) where end=of-day reporting did involve parsing multi GB messages.. I find the approach that I outline here the most convenient method: http://xml2java.net/xml-java-data-mapping-big-data-article.html
Cheers,
Lolke
 
GeeCON Prague 2014
 
subject: how about process large XML file(bigger than 1GB) in Java?