aspose file tools*
The moose likes XML and Related Technologies and the fly likes SAXParser not calling characters() Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "SAXParser not calling characters()" Watch "SAXParser not calling characters()" New topic
Author

SAXParser not calling characters()

steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
I've got a basic class extending DefaultHandler that is installed as the handler for the XMLReader I got from a SAXParser. All the methods I put in get called except characters, even ignorableWhitespace, which seems like it might be. The method at the moment just prints the length value, but it never gets called. I've added an @Override annotation which is not flagged as an error (but it is if I misspell the method name, so it's the right method). This is being called from a JSP in Tomcat if that makes a difference. As far as I can tell, there are no errors being stifled. Any ideas on how I can figure out what's happening?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

steve claflin wrote:The method at the moment just prints the length value, but it never gets called.


So, it's not called, but nevertheless it does something? I must be misinterpreting at least half of that statement. Could you clarify it?

(And let me move this to the XML forum.)
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Dumb question, but is there character data in the XML document? I.e., something that's not an element tag or an attribute?


[Jess in Action][AskingGoodQuestions]
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
Clarifications:

There is character data - it's a fairly simple xml file with a structure like:
<root>
<child>
<grandchild1>character content</grandchild1>
<grandchild2>character content</grandchild2>
</child>
</root>

The various start and end methods (e.g., startElement) get called, but characters never seems to get called (and therefore the output it's supposed to generate never appears). My comment before should have been that it's supposed to print the length value.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
I think we will need to see the source code you have created for the characters() method.

"being called from a JSP" - exactly where does the JSP get the XML document and how does it pass it.

Do you get all the startElement and endElement calls all the way to the last element?

Bill
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
It looks like I get all expected calls to start/endElement and start/endDocument.

Here's the relevant code:



The DemoErrorHandler methods all throw SAXException, even for warning, and I don't get any exceptions shown for the exc variable.
It's also worth noting that in the past when this code was in a Java main class as opposed to a JSP, I did get the expected results, so I'm wondering if there is some issue with the parser that Tomcat is using.
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
Well, I believe that I found the answer, but it raises as much of a question as it solves.

I added back in the ignorableWhitespace method that I had removed from my code listing (it was something that wasn't in the original code I had recieved), and modified it to print the content in yet another color (gray). Lo and behold, the missing data appeared as ignorable whitespace (i.e., I saw the non-whitespace tag text node content printed in gray).

Taking a closer look at the xml and its schema, the root and first child levels were the only two defined elements. The content of the child level was defined as a sequence of xs:any, with processContents="skip".

I would expect the validator to skip the contents, but not the parser.

If I take away the processContents and let it default, then I need to define elements for the grandchildren, either in this schema or in one that has been referenced.

Does this mean that there is no way to parse truly arbitrary content - that any content I don't want to ignore must have been defined somewhere?

And, does it also mean that I can rely on this behavior - that I can mark elements with processContents="skip" and have the parser ignore them? (that is what the attribute name implies, but I've learned not to rely too heavily on others' ideas about good names for things - as in, ignorableWhitespace is now not such a good name any more).
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
Why are you using any schema at all?

What you have sounds so open-ended that it is not going to catch any errors.

Bill
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
The example is form a book that I've got to teach from, so my first need was to make it work at all. The example was supposed to demonstrate, among other things, setting up a validating parser. I think it was just shortcutting on the part of the author that the schema was so minimally defined (and probably also to not have a long and complex schema when the students aren't necessarily xml experts). I would assume that this example worked as is at some point, perhaps in an earlier jdk version.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Sounds like you just need to fix the schema.
Lester Burnham
Rancher

Joined: Oct 14, 2008
Posts: 1337
Shouldn't the code be specifying somewhere against which schema to validate? As its is, it only specifies that the schema language to be used will be XML Schema, but it doesn't actually specify any concrete schema.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Lester Burnham wrote:Shouldn't the code be specifying somewhere against which schema to validate? As its is, it only specifies that the schema language to be used will be XML Schema, but it doesn't actually specify any concrete schema.


'Sprobably specified in the XML document.
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
The schema is indeed referenced in the xml document.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
I have to say that the effect on SAX parsing of

processContents="skip"


which you appear to have documented is fascinating, thanks for documenting your problem.

Bill
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
You're welcome!

I did find a solution to my problem - if I replace the processContents="skip" with processContents="lax" it does give me the characters, but does not require me to specify what the children of the tag so marked are, and they don't need to be mentioned in the schema, either. That seems like it could be useful, as I could see situations where I would want to validate down to a certain depth, and leave the child content below that unspecified (almost like using a CDATA section).
Blaise Doughan
Greenhorn

Joined: Aug 25, 2010
Posts: 8
This is a bug, it can be solved by upgrading to the latest version of the JDK or by setting the following property:



EDIT: Corrected property name
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
Blaise,

I've got 1.6.0_20, and the latest is only 21. The fixes page for that release doesn't mention anything about this. Also, the XMLReader constant you mentioned doesn't exist for my version. (In fact, the only Google listing for it is this thread )

I've been trying to find more info on the bug you described - do you have a link or bug id that could shed more light on it?
Blaise Doughan
Greenhorn

Joined: Aug 25, 2010
Posts: 8
We saw that the problem existed in versions 17 to 20. Did you try the property?
steve claflin
Ranch Hand

Joined: Dec 04, 2008
Posts: 54
The constant XMLReader.REPORT_IGNORED_ELEMENT_CONTENT_WHITESPACE_FEATURE is not recognized by the compiler.
Blaise Doughan
Greenhorn

Joined: Aug 25, 2010
Posts: 8
In the previous post I had entered the wrong property name, it should have been:


 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SAXParser not calling characters()