This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes what  is  SAX (event-based) parsing   ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "what  is  SAX (event-based) parsing   ?" Watch "what  is  SAX (event-based) parsing   ?" New topic
Author

what is SAX (event-based) parsing ?

alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
what is "SAX (event-based) parsing " ?

say, i have a XML



Now what does this "SAX (event-based) parsing " means here ?

please explain .

thank you
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

You would have to write some methods with names like startElement(), endElement(), and characters(). The SAX parser would example the document and then call the methods you provided in this order:

startElement("note");
startElement("to");
characters("Tove");
endElement("to");
startElement("from");
characters("Jani");
endElement("from");
...
endElement("note");

Besides the name of the element, your methods would also get a Collection of attributes as an argument. There are other methods you can implement as well to be notified of processing instructions and other XML features. Your implementations of these methods would be free to do whatever they wanted with the information.

The other kind of parser is a DOM parser. This kind runs through the whole document and builds a tree-like data structure to represent it; then you examine the tree to learn about the document. A DOM parser uses lots more memory to hold that tree. A SAX parser is generally faster, but harder for some people to understand.

I'm going to move this to our "XML and Related Technologies" forum to continue the discussion.


[Jess in Action][AskingGoodQuestions]
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
thanks for your explanation.

there is 2 types of parser i know...SAX based parser, DOM based parser.

i dont understand the difference between these two actually.

you said,


You would have to write some methods with names like startElement(), endElement(), and characters(). The SAX parser would example the document and then call the methods you provided in this order:

startElement("note");
startElement("to");
characters("Tove");
endElement("to");
startElement("from");
characters("Jani");
endElement("from");
...
endElement("note");




that means, all the SAX parser commercial companies, includes those startElement,endElement ,...methods in their API so that the coder can use those methods and parse accordingly the way you mentioned.

thats very nice...so, this is the secret of SAX parsing . but why it is called "event based" ? this is misleading term. please explain.









The other kind of parser is a DOM parser. This kind runs through the whole document and builds a tree-like data structure to represent it; then you examine the tree to learn about the document. A DOM parser uses lots more memory to hold that tree. A SAX parser is generally faster, but harder for some people to understand.


now i am confused in this DOM parser.

again,lets take my XML




How a DOM parser will parse this ? is it not the same way as you have showed already for a SAX parser. does those DOM parser dont support startElement(),endElement(),characters() etc etc kind of methods in their API ?


how they are different that SAX ?

please show an example with my sample XML for a DOM parsing as similar as you have showed for a SAX parser.

examples are very good tool to understand.

thank you
Dave Lenton
Ranch Hand

Joined: Jan 20, 2005
Posts: 1241
Originally posted by alfred jones:
but why it is called "event based" ? this is misleading term.
It comes from the way in which these events are caused. The SAX parser will look at the start of an XML file/stream and then work its way through it. Each time it meets a certain thing its been looking for (the start of an element, an attribute, some text and so on) it will fire the method appropriate for this particular event. The idea is that the parser moves through the XML only once, and only from start to end. Once the end of the XML is reached, the parsing stops.

How a DOM parser will parse this ?
Actually most (all?) DOM parsers use SAX somewhere along the line. What the DOM parser does is to us SAX to move through the XML, and create a tree-like structure of objects as it goes along. The fact that it uses SAX is irrelevant to the user of the DOM because the user is only concerned with the tree-like structure which is produced at the end of the parsing process.

SAX is most commonly used if the XML is only to be traversed once, for example if some XML is to be "scanned" to extract some information. DOM is used more if the tree needs to be traversed several times in different directions. It is better if someone wants to represent XML as a tree, move some of the nodes around and then do some processing based upon a the new XML tree. DOM is also more useful if you need to know relationships between two given nodes. While DOM has these advantages, its often more memory hungry then SAX as DOM needs to hold information about the entire tree in memory, while SAX only has information about the current part being processed.


There will be glitches in my transition from being a saloon bar sage to a world statesman. - Tony Banks
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279
very much confused with your responses.


It comes from the way in which these events are caused.


what do you mean by "events" here ? do you mean calling of startElements(),endElements()..blah blah methods ? do you call these as events ?

However, every parser API (SAX or DOM )has to have some methods to call...so all of them are event based ?


The SAX parser will look at the start of an XML file/stream and then work its way through it. Each time it meets a certain thing its been looking for (the start of an element, an attribute, some text and so on) it will fire the method appropriate for this particular event.


please explain the meaning of "event" here.



The idea is that the parser moves through the XML only once, and only from start to end. Once the end of the XML is reached, the parsing stops.


that means, the SAX parser can not go backwards , from end to start ?



Actually most (all?) DOM parsers use SAX somewhere along the line. What the DOM parser does is to us SAX to move through the XML, and create a tree-like structure of objects as it goes along.


what ?
"create a tree-like structure of objects as it goes along"....ummmm, but XML is a tree like structure , it has a tree like hierarchy, nodes, childs, etc etc .. so whats the big thing about it ? how can this be a differentiating factor with SAX ?

Its the very same XML which has been used by both the SAX and XML.
so you mean SAX does not "create a tree-like structure of objects as it goes along" ?

confusing matter here.




The fact that it uses SAX is irrelevant to the user of the DOM because the user is only concerned with the tree-like structure which is produced at the end of the parsing process.


again , you are saying "tree-like structure".

whats the deal with "tree-like structure" for both SAX and DOM ?


thank you
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

At this point it's really impossible to answer all your individual questions -- we'd be arguing over the meaning of many individual words. But I will tell you what SAX and DOM parsers are again.

A SAX parser goes through your XML file and calls methods that you supply at various points during the process. These method calls are "events". So a SAX parser turns an XML file into a series of method calls.

A DOM parser parses the entire file, creating many Java objects to represent the contents of the file. When you use a DOM parser, you call a single "parse()" method; the return value of this parse() method will be a big tree of those Java objects. You must then search through that tree of objects yourself to find the information you need about the file. A DOM parser turns an XML file into a Java data structure that you can then examine.
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279

At this point it's really impossible to answer all your individual questions -- we'd be arguing over the meaning of many individual words. But I will tell you what SAX and DOM parsers are again.

A SAX parser goes through your XML file and calls methods that you supply at various points during the process. These method calls are "events". So a SAX parser turns an XML file into a series of method calls.

A DOM parser parses the entire file, creating many Java objects to represent the contents of the file. When you use a DOM parser, you call a single "parse()" method; the return value of this parse() method will be a big tree of those Java objects. You must then search through that tree of objects yourself to find the information you need about the file. A DOM parser turns an XML file into a Java data structure that you can then examine.



thank you Ernest, this is beautiful explanation.

i have no enimity with SAX and DOM...i have understood SAX but it has become difficult to understand the DOM.


you told,


you call a single "parse()" method; the return value of this parse() method will be a big tree of those Java objects.


big tree of java objects !!

hard to grasp it physically.

do you mean, the tree is consists of java objects. i can assume these objects are like fruits of the tree. and "nodes" are seeds of those fruits.
so, at anytime , i can get hold of any fruit and extract the seed i.e the node out of it.


is it something like this ?

well, this is getting more freedom to parse, because i can jump to any node at any time.

However, it would have been very much appreciated if you provide some example of this DOM parsing .

the creation of "big tree of those Java objects" bit harder concept.

thank you
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Yes, that's exactly what the term "tree" means. I'm not just making it up, though -- it's a pretty standard computer-science term. One object is the "root" of the tree. It has references to some other objects which are "branches" or "internal nodes." Those branches can refer to other branches or to "leaf nodes", the ends of the tree that you were calling "fruits." And indeed, you can browse around in the tree and "pick the fruits."

As I said, that's the good part of using a DOM parser. The bad part is that this tree can take up a lot of memory for a big document, and building it can be slow. A SAX parser doesn't use all that memory, and it's generally faster.
alfred jones
Ranch Hand

Joined: Apr 19, 2005
Posts: 279

Yes, that's exactly what the term "tree" means. I'm not just making it up, though -- it's a pretty standard computer-science term. One object is the "root" of the tree. It has references to some other objects which are "branches" or "internal nodes." Those branches can refer to other branches or to "leaf nodes", the ends of the tree that you were calling "fruits." And indeed, you can browse around in the tree and "pick the fruits."

As I said, that's the good part of using a DOM parser. The bad part is that this tree can take up a lot of memory for a big document, and building it can be slow. A SAX parser doesn't use all that memory, and it's generally faster.




OK.

so, the differences are,
1)SAX does not form tree kind of structures but DOM does form tree kind of structures.


2) SAX parse serially. because it goes through document step by step and call appropriate methods to get the text.

but DOM, can jump to any node ( which is a java objects in the tree) and can attack to a paricular location and get the text out of it.

3)SAX consumes less memory. DOM consumes higher memory and does bit slow parsing.

but why DOM consumes high memory ? is it just because of formation of tree comprising java objects ?





Please , please tell me the name of the two commercial products (parser) which parse SAX--ian way and which parse DOM-ian way.
Dave Lenton
Ranch Hand

Joined: Jan 20, 2005
Posts: 1241
Originally posted by alfred jones:
but why DOM consumes high memory ? is it just because of formation of tree comprising java objects ?
Imagine an XML document containing 2,000 nodes. When DOM is used to parse this document, there will be at least 2,000 objects stored in memory for the resulting DOM tree. There may even be more (depending on if you counted text, attributes and so on in your original count). This is because each node, each attribute, each bit of text (which are nodes in themselves) and so on need to have a separate object in memory. SAX, on the other hand, will only ever have in memory objects to do with the current part of the document being processed. Although this may consist of several objects, its unlikely to be as much as 2,000.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: what is SAX (event-based) parsing ?
 
Similar Threads
how to remove indentation in a xml using java api
Sorting xml string
Sorting
Extracting specific section of large xml file.
Validating Parser