aspose file tools*
The moose likes XML and Related Technologies and the fly likes Large XML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Large XML" Watch "Large XML" New topic
Author

Large XML

Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
Hi,
I am trying to process large XML(50MB,sometimes 75 MB) with deep nested nodes.As a result parsing (which i am doing it in Tibco Business Works) taking long time(more than 5 min).tibco uses DOM parser internally to parse the document.
What are different approach we can take to optimize? Converting file data to byte array and then processing it? or changing the parser by writing the code in Java?


MH
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
75MB is actually very small. 3-4 GB would be considered large. Anyway, you could process the data using a SAX-based application.

Take note that DOM and SAX are API, they are not parsers. So, its not about "changing the parser." The parser most likely stay the same. What will change is the application that is processing the data immediately after parsing occurs.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
What exactly are you trying to do with this file?

1. Extract only a few data items?
2. Perform complex queries?
3. Write a modified XML document?
etc etc

Bill
Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
hi,
Extract all sub items, transform each of them to flat file format, consolidate entire data and write to flat file.This is what i m planning to do.Transforming and writing to flat file will be done by Tibco(BW) tool.So entire process is to
1)Parse the XML.
2)Transform each subitem to flat file format.
3)Write enfire subitems data to flat file.
2 and 3 too are taking much time.But that can not be modified.So i m thinking on parsing side, if its possible to reduce the time.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
Short answer - probably not.

Since parsing time is the slowest step in practically every application treating XML this way, you may be very sure that a LOT of thought has gone into optimizing parsers.

Turning off any validation may help.

Do you have any control over the way the XML is produced?

Bill
Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
No.We don't have control on how the XML is produced.One possibility is to replace Tibco BW by java component.
chetan dhumane
Ranch Hand

Joined: Jan 07, 2009
Posts: 629

Then you have to replace the BW with Java code for doing the same.
That is the only solution.


http://www.androcid.com/
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
I suggest you take a look at the ServingXML toolkit. This is a open source "pipeline" style processor which has been around for a while. The page I cited leads to extensive examples of conversions such as you need.

Bill
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
1)Parse the XML.
2)Transform each subitem to flat file format.
3)Write enfire subitems data to flat file


Arjun, it would be beneficial to think of these as a single process. The creation of the files will be part of the parsing process. In other words as the parser is reading the XML-based data, it is writing to the files. When the parsing is completed the files have been created. A well-written SAX-based application should be able to process 75 MB in less than 60 seconds.

If you need some help with writing the SAX application, check out the following web page for more information. Good luck!

http://www.retrievalsystems.com/
Arjun Shastry
Ranch Hand

Joined: Mar 13, 2003
Posts: 1874
Thanks all for the help.I will definitely look into above things.
T Dahl
Ranch Hand

Joined: Oct 07, 2010
Posts: 35
Did you consider XSLT as an alternative to Java for this application?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Large XML