aspose file tools*
The moose likes XML and Related Technologies and the fly likes Why is this happening? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Why is this happening?" Watch "Why is this happening?" New topic
Author

Why is this happening?

Tom Sullivan
Ranch Hand

Joined: Dec 20, 2005
Posts: 72
I am getting unexpected values from an XML parse using DOM. Can someone tell me why this is happening?

Here is the XML node I'm parsing:



That is an MS Excel 2003 doc converted to XML. Here is the test code to parse just this node:



Here is the output:

ROOT: Workbook
NODE NAME: DocumentProperties
List size = 11
Node: 0 Name: #text
Node: 1 Name: LastAuthor
Node: 2 Name: #text
Node: 3 Name: LastPrinted
Node: 4 Name: #text
Node: 5 Name: Created
Node: 6 Name: #text
Node: 7 Name: LastSaved
Node: 8 Name: #text
Node: 9 Name: Version
Node: 10 Name: #text

First problem is that I expect 5 nodes not 10. Where is: "#text" coming from?

I also do not understand why I have to say:

Node nextNode = root.getFirstChild()
Element docNode = nextNode.getNextSibling()//docNode should be first child

"DocumentProperties" should be the first child. But if I rely on that, I again get a node name of: "#text" as well.

Thanks.

Tom
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39549
    
  27
If you examine the contents of the "#text" nodes, you'll see that they represent the white space between the elements (which is significant in XML).

If you use JAXP for parsing, have a look at the DocumentBuilderFactory.setIgnoringElementContentWhitespace method, which suppresses the creation of nodes containing only white space (if I understand this page correctly).

In the future, please use a more descriptive subject line. "Why is this happening?" says nothing about the problem at hand.
[ August 04, 2006: Message edited by: Ulf Dittmer ]

Ping & DNS - updated with new look and Ping home screen widget
Tom Sullivan
Ranch Hand

Joined: Dec 20, 2005
Posts: 72
I thought I did that. My parse() function looks like this:

Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18141
    
    8

Originally posted by Ulf Dittmer:
If you use JAXP for parsing, have a look at the DocumentBuilderFactory.setIgnoringElementContentWhitespace method, which suppresses the creation of nodes containing only white space (if I understand this page correctly).
The API documentation for the method says this:
Note that only whitespace which is directly contained within element content that has an element only content model (see XML Rec 3.2.1) will be eliminated. Due to reliance on the content model this setting requires the parser to be in validating mode.
I believe that jargon about "element only content model" means the document has to be described by a DTD or a schema, at least.
Tom Sullivan
Ranch Hand

Joined: Dec 20, 2005
Posts: 72
Thanks very much for the help. I'll go get the MS schema and see if that solves the problem.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Why is this happening?
 
Similar Threads
IBM XML parser for JAVA
Why I can't retrieve records stored in a XMLFile correctly?
DOM XML Parsing - Newbie
Xinclude, read 2 xml files
how to ignore whitespace in xml