Why is this happening?

Ranch Hand

Posts: 72

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I am getting unexpected values from an XML parse using DOM. Can someone tell me why this is happening?

Here is the XML node I'm parsing:

That is an MS Excel 2003 doc converted to XML. Here is the test code to parse just this node:

Here is the output:

ROOT: Workbook
NODE NAME: DocumentProperties
List size = 11
Node: 0 Name: #text
Node: 1 Name: LastAuthor
Node: 2 Name: #text
Node: 3 Name: LastPrinted
Node: 4 Name: #text
Node: 5 Name: Created
Node: 6 Name: #text
Node: 7 Name: LastSaved
Node: 8 Name: #text
Node: 9 Name: Version
Node: 10 Name: #text

First problem is that I expect 5 nodes not 10. Where is: "#text" coming from?

I also do not understand why I have to say:

Node nextNode = root.getFirstChild()
Element docNode = nextNode.getNextSibling()//docNode should be first child

"DocumentProperties" should be the first child. But if I rely on that, I again get a node name of: "#text" as well.

Thanks.

Tom

Ulf Dittmer

Rancher

Posts: 43081

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

If you examine the contents of the "#text" nodes, you'll see that they represent the white space between the elements (which is significant in XML).

If you use JAXP for parsing, have a look at the DocumentBuilderFactory.setIgnoringElementContentWhitespace method, which suppresses the creation of nodes containing only white space (if I understand this page correctly).

In the future, please use a more descriptive subject line. "Why is this happening?" says nothing about the problem at hand.
[ August 04, 2006: Message edited by: Ulf Dittmer ]

Tom Sullivan

Ranch Hand

Posts: 72

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I thought I did that. My parse() function looks like this:

Paul Clapham

Marshal

Posts: 28226

I like...

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Ulf Dittmer:
If you use JAXP for parsing, have a look at the DocumentBuilderFactory.setIgnoringElementContentWhitespace method, which suppresses the creation of nodes containing only white space (if I understand this page correctly).

The API documentation for the method says this:

Note that only whitespace which is directly contained within element content that has an element only content model (see XML Rec 3.2.1) will be eliminated. Due to reliance on the content model this setting requires the parser to be in validating mode.

I believe that jargon about "element only content model" means the document has to be described by a DTD or a schema, at least.

Java 8 (verified skill)
Skill verified by Paul Clapham

Tom Sullivan

Ranch Hand

Posts: 72

posted 17 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Thanks very much for the help. I'll go get the MS schema and see if that solves the problem.