• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Is DOM is still a slow compare to SAX?

 
Ranch Hand
Posts: 116
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All,

Wondering if DOM is still slow and a memory hog compare to using SAX? I like to use DOM as I need to parse multiple attributes within a particular node of a big (~100 meg) xml file. If I am still stuck with SAX can anybody suggest a generic solution using SAX where I could parse xml tags like this -

<?xml version="1.0" standalone="yes"?>
<ABC sup_id="" RecordCount="">
<TransactionList>
<Transaction DataType="" ProgramNumber="" ProgramPhase=""
<IdentityList>
<Identity Type="" Title="" FirstName="" MiddleName="" LastName="" Suffix="" Gender="" DateOfBirth="" Address1="" Address2="" City="" State="" Zip="" ZipExt="" UndeliverableStatus="" />
</IdentityList>
<Contact Type="" phone_area_code="" phone_number="" phone_status="" PhoneContactTime="" Email="" Email_Status_Cd="" />
<Qualification SignatureFlag="" IndexNumber="" >
<AgeVerification AgeVerifiedFlag="" VerifiedSource="" MatchType="" AgeVerifiedType="" BatchUnderage="" />
<ID FirstName="" MiddleName="" LastName="" Address1="" Address2="" City="" State="" Zip="" DateOfBirth="">
<ExceptionList>
<Exception ExceptionReason="" />
</ExceptionList>
</ID>
</Qualification>
<PreferenceList>
<Preference Type="" Code="" Value="" />
</PreferenceList>
<OfferList>
<Offer OfferNum="" OfferType="" OrderNumber="">
<ContinuityList>
<Continuity Quantity="" Number="" />
</ContinuityList>
<SurveyDataList>
<SurveyData Question="" ResponseCode="" ResponseID="" FreeForm="" />
</SurveyDataList>
</Offer>
</OfferList>
</Transaction>
</TransactionList>
</ABC>

One thing I like to mention. The node name will be passed an command argument for parsing a particular node attribute from the xml file.

Your help with code samples, hints, ideas, suggestion would be greatly appreciated.

Thanks!
 
Ranch Hand
Posts: 775
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Tariq Ahsan:
Wondering if DOM is still slow and a memory hog compare to using SAX? I like to use DOM as I need to parse multiple attributes within a particular node of a big (~100 meg) xml file.



It is a common misconception that DOM has to be big and slow, mostly because historically once upon a time it was big and slow. Now it depends on what you are doing because DOM is lazy in loading data.

If you want to read an entire file and actually traverse all the elements in that file, then for a 100meg document you don't want to use DOM *unless* you know you really want a full tree representation of that document in memory. DOM is ok if you want to just dig out a few pieces of the tree, but SAX is better if you want to scan through the entire document.

STAX is also another alternative that fits a bit between the two. It is good for scanning through parts of the tree, but smarter than SAX about skipping things you don't want to parse. The downside to STAX is that you don't get things like validation and entity resolution without some extra effort - but that is also why it is faster than SAX or DOM when you don't care about validation.
 
Tariq Ahsan
Ranch Hand
Posts: 116
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Reid for your response. Do you think in terms of performance and memory usage it make sense to use DOM instead of SAX for parsing a large xml file where I am just going to search for a particular node and parse it's attributes and dump it into a delimited text file? I am inclined to use DOM because of it's ease of handling elements.
 
Reid M. Pinchback
Ranch Hand
Posts: 775
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would definitely give it a try. Just set up a unit test with a small bit of DOM code and a big file that you think fairly represents or at least approximates the kind of document you'll work with. Create a pause in the code after you've done your processing so the Java process doesn't exit. Check the memory footprint before and after. I am sure about the lazy loading, what I'm not sure about is what happens if you dive into a place late in the document; does the tree only get built for the nodes you try to examine, or for all the data scanned from the start up to the point you said you care about. I'm pretty sure it is the former, but you'll want to test to make sure.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Which implementation is "lazy loading" a DOM?

I'd bet money that the Java JAXP implementation does not. The parser has to completely parse the document before returning.

Bill
 
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Tariq Ahsan:
Thanks Reid for your response. Do you think in terms of performance and memory usage it make sense to use DOM instead of SAX for parsing a large xml file where I am just going to search for a particular node and parse it's attributes and dump it into a delimited text file? I am inclined to use DOM because of it's ease of handling elements.



SAX is a better option if you are searching for just a particular node and parsing it's attributes. Unlike DOM, SAX bypasses the creation of a tree based object model of your information, and hence is faster.
[ January 20, 2006: Message edited by: Sara James ]
 
Tariq Ahsan
Ranch Hand
Posts: 116
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all for your suggestions. Just wondering if you guys have some sample code, ideas or suggestions how I could come up with passing the node name as an argument to get the attributes belong to it.

Thanks
 
Sara Tracy
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Tariq Ahsan:
Thank you all for your suggestions. Just wondering if you guys have some sample code, ideas or suggestions how I could come up with passing the node name as an argument to get the attributes belong to it.

Thanks




you could refer to the SAX chapter in Beginning XML (WROX)- they have some easy examples to work with.

check this site. You can refer to chapter 12 folder (SAX). Download the code and work with it.
http://www.wrox.com/WileyCDA/WroxTitle/productCd-0764570773,descCd-download_code.html
[ January 20, 2006: Message edited by: Sara James ]
 
Reid M. Pinchback
Ranch Hand
Posts: 775
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by William Brogden:
Which implementation is "lazy loading" a DOM?

I'd bet money that the Java JAXP implementation does not. The parser has to completely parse the document before returning.

Bill



You are likely correct for JDK distributions containing Crimson. For the latest 1.5 updates Sun switched to a recent Xerces release for the underlying processor. That may change in the future; Sun sounds pretty gung-ho on migrating many of the core code or reference implementations to Stax over time.

Don't have the links handy, but there have been a few magazine and web articles over the last couple of years about the performance impact of DOM parsers that were lazy; mostly the articles were motivated by curiosity about pull-parsers, DOM vs SAX info mostly just listed for comparison. Later DOM parsers behave a bit more like pull parsers when they are lazy, so the who-is-better tends to be less of the hard-and-fast rule it used to be, now more of a depends-on-what-you-are-using-and-doing.

From my own performance tuning efforts, I found that the choice of parsing technique during parsing often had less importance than the initialization phase itself. Caching of things like configured sax handlers (e.g. Digester) or factories (e.g. JAXB), etc. so you don't have a per-document cost far swamped concerns over the choice of deserialization technique. Particularly that is the case for large volumes of small documents, no so clear-cut with small volumes of large documents.
 
Ranch Hand
Posts: 2108
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As already mentioned above,, for such huge volume, you can consider the patterns in the data, if any, as mentioned above. Does the data you appear near the end of the xml? Or everywhere? Or at the beginning? Also, in your processing, do you go up and down in the document(travelling not just downward), which would suggest using DOM?

Do you instantiate many instances of this class at one time, thereby, if you use DOM, you will create multiple huge amount of objects?

It will help to do actual volume testing. Check the performance of both DOM and SAX, passing in realistic voluminous data.
 
Something about .... going for a swim. With this tiny ad ...
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic