• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Large XML

 
Arjun Shastry
Ranch Hand
Posts: 1893
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am trying to process large XML(50MB,sometimes 75 MB) with deep nested nodes.As a result parsing (which i am doing it in Tibco Business Works) taking long time(more than 5 min).tibco uses DOM parser internally to parse the document.
What are different approach we can take to optimize? Converting file data to byte array and then processing it? or changing the parser by writing the code in Java?
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
75MB is actually very small. 3-4 GB would be considered large. Anyway, you could process the data using a SAX-based application.

Take note that DOM and SAX are API, they are not parsers. So, its not about "changing the parser." The parser most likely stay the same. What will change is the application that is processing the data immediately after parsing occurs.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What exactly are you trying to do with this file?

1. Extract only a few data items?
2. Perform complex queries?
3. Write a modified XML document?
etc etc

Bill
 
Arjun Shastry
Ranch Hand
Posts: 1893
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
Extract all sub items, transform each of them to flat file format, consolidate entire data and write to flat file.This is what i m planning to do.Transforming and writing to flat file will be done by Tibco(BW) tool.So entire process is to
1)Parse the XML.
2)Transform each subitem to flat file format.
3)Write enfire subitems data to flat file.
2 and 3 too are taking much time.But that can not be modified.So i m thinking on parsing side, if its possible to reduce the time.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Short answer - probably not.

Since parsing time is the slowest step in practically every application treating XML this way, you may be very sure that a LOT of thought has gone into optimizing parsers.

Turning off any validation may help.

Do you have any control over the way the XML is produced?

Bill
 
Arjun Shastry
Ranch Hand
Posts: 1893
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No.We don't have control on how the XML is produced.One possibility is to replace Tibco BW by java component.
 
chetan dhumane
Ranch Hand
Posts: 641
Android Eclipse IDE Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then you have to replace the BW with Java code for doing the same.
That is the only solution.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I suggest you take a look at the ServingXML toolkit. This is a open source "pipeline" style processor which has been around for a while. The page I cited leads to extensive examples of conversions such as you need.

Bill
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1)Parse the XML.
2)Transform each subitem to flat file format.
3)Write enfire subitems data to flat file


Arjun, it would be beneficial to think of these as a single process. The creation of the files will be part of the parsing process. In other words as the parser is reading the XML-based data, it is writing to the files. When the parsing is completed the files have been created. A well-written SAX-based application should be able to process 75 MB in less than 60 seconds.

If you need some help with writing the SAX application, check out the following web page for more information. Good luck!

http://www.retrievalsystems.com/
 
Arjun Shastry
Ranch Hand
Posts: 1893
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all for the help.I will definitely look into above things.
 
T Dahl
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Did you consider XSLT as an alternative to Java for this application?
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic