GeeCON Prague 2014*
The moose likes XML and Related Technologies and the fly likes SAX Parsing in JDK 1.6_14 and multiple lines in an element's value Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "SAX Parsing in JDK 1.6_14 and multiple lines in an element Watch "SAX Parsing in JDK 1.6_14 and multiple lines in an element New topic
Author

SAX Parsing in JDK 1.6_14 and multiple lines in an element's value

Jim Atharris
Greenhorn

Joined: Jan 07, 2008
Posts: 28
Hello,

We're using a SAX parser and currently have a class that extends org.xml.sax.helpers.DefaultHandler.

We've overrode startElement(), endElement(), and characters().

In our characters() method, the current code (a lot of it) considers the invocation of characters() to mean that all of the element's value (the "Hello" in <world>Hello</world) is complete and then goes thru a large if/else-if/else-if/... try/catch(es) statement. Then the endElement() is invoked and more if/else-if/... try/catch(es) statements are executed.

I've read that the semantics of characters() is that it is invoked multiple times if the element's value contains multiple lines and really characters() method should just "buffer" its value. And only once endElement() is called then and only then is the element's value is complete.

Because there is a lot of code, my question is if the default functionality can be overridden in the SAX parser so that characters() is called once irrelevant if the element's value contains multiple lines or not?

We're not running Java (on Windows/Unix) with any special options other than "java -cp . MyParser my_data.xml"

Thanks,Jim
>
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12791
    
    5
The behavior of characters() is rock hard encoded into the SAX api.

You could use the StaX parser approach. See javax.xml.stream package.

As I understand it, when the StaX parser hands you a Characters event all of the element text is available in that event.

If you wrote a lot of code assuming behavior of characters() that does not correspond to the API, regard it as a learning experience.

Bill
 
GeeCON Prague 2014
 
subject: SAX Parsing in JDK 1.6_14 and multiple lines in an element's value