aspose file tools*
The moose likes XML and Related Technologies and the fly likes SAX parsing question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "SAX parsing question" Watch "SAX parsing question" New topic
Author

SAX parsing question

Mark Reyes
Ranch Hand

Joined: Jul 09, 2007
Posts: 426
Hi All,

I am starting to learn about parsing XML with SAX. I have done this using DOM Parser but I would like to do it using SAX.
I have a sample XML like this. Its basically a simple course with list of students.



Now, I have this SAX handler. But the thing that I notice is that I need to use some flag variables just to print the course id.
My XML has two course tag, one is the root element while the others refers to the student course.


Is there a better logic on doing this or this is really how parsing using SAX is done? Thanks


Sean Clark ---> I love this place!!!
Me ------> I definitely love this place!!!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18135
    
    8

Yes, that's right. Think of it as if the document is written on a long piece of tape, and you are only looking at one part of the tape at any time. And you can't go back to see anything you looked at before, you can only go forward.

So you're going to have to keep track of where you are in the tape, and keep notes on important bits of the document as you see them. That's what you did.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
And you can't go back to see anything you looked at before, you can only go forward.


Just to clarify Paul's statement. You can't go backwards in the actual XML-based document. However, if you save specific information that you are interested in, you can certainly refer to this information at any point in time, i.e. while you are reading the document.

This is not to be considered parsing logic though. A SAX-based parser is doing the actual "parsing" and is calling your SAX-based data processing application. In other words, your application receives callbacks from the parser, e.g. Apache Xerces, while it is parsing the document. Xerces is the parser, not your application.

Aside, the design of a particular markup language plays a significant role in how "easy" or how "difficult" it will be to write a processing application for documents marked up with the language. In the example above, there are two different types of elements which are using the same name, e.g. "course".

This kind of ambiguity should be avoided when designing a markup language, especially if the root element is involved.
Mark Reyes
Ranch Hand

Joined: Jul 09, 2007
Posts: 426
Yes, that's right. Think of it as if the document is written


Hi Paul,

Thanks for these info. I think I got an understanding now on how SAX works.
On another note, I look up at another parser implementation which is the STAX Cursor API.
Based on my readings of the API, it seems that the only way to get also to print the course ID
is to use flag variables also.
Is this correct? Kindly skim my code and please comment.



Aside, the design of a particular markup language plays a significant role in how "easy" or how "difficult" it will be to write a processing application for documents marked up with the language. In the example above, there are two different types of elements which are using the same name, e.g. "course".


Hi Jimmy, Thanks for these info. I will keep this in mind when I design a real XML file. For the meantime, for the purpose of learning, I just complicated my xml a bit so that I could another logic in my parsing application and understand how DOM/SAX/STAX parsing differs.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
Sounds good Mark. The DOM API depends upon an in-memory object model of the document. Hence the name, Document Object Model. DOM implementions also depend upon the SAX API to create the object model under the covers.

The SAX API uses a streaming paradigm to flow through the document. However, the algorithms and patterns of how this works are a bit complex for some programmers. It follows an event-based flow using an Observer/Notification pattern.

The StAX API is basically an abstraction of the SAX API using a different style of algorithm, i.e. a more common iterative style. StAX is also dependent upon SAX under the covers. This is advertised as a "friendlier" and/or "easier" API. However, this is debatable and depends upon the particular programmer's experience. In my opinion, sticking everything in a for loop is not friendly and leads to a more difficult implementation. Again, eveything mostly depends upon the design of the markup language itself.

Hope this helps!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SAX parsing question
 
Similar Threads
Xpath
Need help on saving child object
Hibernate POJOs -- association or no-association ?
Missing characters with SAX Parser
JPA: Map composite primary key with one field being foreign key