This week's book giveaway is in the Agile and Other Processes forum.
We're giving away four copies of Darcy DeClute's Scrum Master Certification Guide: The Definitive Resource for Passing the CSM and PSM Exams and have Darcy DeClute on-line!
See this thread for details.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Jeanne Boyarsky
  • Tim Cooke
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Mikalai Zaikin
  • Carey Brown
Bartenders:

SAX parsing question

 
Ranch Hand
Posts: 426
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I am starting to learn about parsing XML with SAX. I have done this using DOM Parser but I would like to do it using SAX.
I have a sample XML like this. Its basically a simple course with list of students.



Now, I have this SAX handler. But the thing that I notice is that I need to use some flag variables just to print the course id.
My XML has two course tag, one is the root element while the others refers to the student course.


Is there a better logic on doing this or this is really how parsing using SAX is done? Thanks
 
Marshal
Posts: 27999
94
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, that's right. Think of it as if the document is written on a long piece of tape, and you are only looking at one part of the tape at any time. And you can't go back to see anything you looked at before, you can only go forward.

So you're going to have to keep track of where you are in the tape, and keep notes on important bits of the document as you see them. That's what you did.
 
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

And you can't go back to see anything you looked at before, you can only go forward.



Just to clarify Paul's statement. You can't go backwards in the actual XML-based document. However, if you save specific information that you are interested in, you can certainly refer to this information at any point in time, i.e. while you are reading the document.

This is not to be considered parsing logic though. A SAX-based parser is doing the actual "parsing" and is calling your SAX-based data processing application. In other words, your application receives callbacks from the parser, e.g. Apache Xerces, while it is parsing the document. Xerces is the parser, not your application.

Aside, the design of a particular markup language plays a significant role in how "easy" or how "difficult" it will be to write a processing application for documents marked up with the language. In the example above, there are two different types of elements which are using the same name, e.g. "course".

This kind of ambiguity should be avoided when designing a markup language, especially if the root element is involved.
 
Mark Reyes
Ranch Hand
Posts: 426
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Yes, that's right. Think of it as if the document is written



Hi Paul,

Thanks for these info. I think I got an understanding now on how SAX works.
On another note, I look up at another parser implementation which is the STAX Cursor API.
Based on my readings of the API, it seems that the only way to get also to print the course ID
is to use flag variables also.
Is this correct? Kindly skim my code and please comment.



Aside, the design of a particular markup language plays a significant role in how "easy" or how "difficult" it will be to write a processing application for documents marked up with the language. In the example above, there are two different types of elements which are using the same name, e.g. "course".



Hi Jimmy, Thanks for these info. I will keep this in mind when I design a real XML file. For the meantime, for the purpose of learning, I just complicated my xml a bit so that I could another logic in my parsing application and understand how DOM/SAX/STAX parsing differs.
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds good Mark. The DOM API depends upon an in-memory object model of the document. Hence the name, Document Object Model. DOM implementions also depend upon the SAX API to create the object model under the covers.

The SAX API uses a streaming paradigm to flow through the document. However, the algorithms and patterns of how this works are a bit complex for some programmers. It follows an event-based flow using an Observer/Notification pattern.

The StAX API is basically an abstraction of the SAX API using a different style of algorithm, i.e. a more common iterative style. StAX is also dependent upon SAX under the covers. This is advertised as a "friendlier" and/or "easier" API. However, this is debatable and depends upon the particular programmer's experience. In my opinion, sticking everything in a for loop is not friendly and leads to a more difficult implementation. Again, eveything mostly depends upon the design of the markup language itself.

Hope this helps!
 
He's dead Jim. Grab his tricorder. I'll get his wallet and this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic