• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Why is this happening?

 
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am getting unexpected values from an XML parse using DOM. Can someone tell me why this is happening?

Here is the XML node I'm parsing:



That is an MS Excel 2003 doc converted to XML. Here is the test code to parse just this node:



Here is the output:

ROOT: Workbook
NODE NAME: DocumentProperties
List size = 11
Node: 0 Name: #text
Node: 1 Name: LastAuthor
Node: 2 Name: #text
Node: 3 Name: LastPrinted
Node: 4 Name: #text
Node: 5 Name: Created
Node: 6 Name: #text
Node: 7 Name: LastSaved
Node: 8 Name: #text
Node: 9 Name: Version
Node: 10 Name: #text

First problem is that I expect 5 nodes not 10. Where is: "#text" coming from?

I also do not understand why I have to say:

Node nextNode = root.getFirstChild()
Element docNode = nextNode.getNextSibling()//docNode should be first child

"DocumentProperties" should be the first child. But if I rely on that, I again get a node name of: "#text" as well.

Thanks.

Tom
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you examine the contents of the "#text" nodes, you'll see that they represent the white space between the elements (which is significant in XML).

If you use JAXP for parsing, have a look at the DocumentBuilderFactory.setIgnoringElementContentWhitespace method, which suppresses the creation of nodes containing only white space (if I understand this page correctly).

In the future, please use a more descriptive subject line. "Why is this happening?" says nothing about the problem at hand.
[ August 04, 2006: Message edited by: Ulf Dittmer ]
 
Tom Sullivan
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I thought I did that. My parse() function looks like this:

 
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
If you use JAXP for parsing, have a look at the DocumentBuilderFactory.setIgnoringElementContentWhitespace method, which suppresses the creation of nodes containing only white space (if I understand this page correctly).

The API documentation for the method says this:

Note that only whitespace which is directly contained within element content that has an element only content model (see XML Rec 3.2.1) will be eliminated. Due to reliance on the content model this setting requires the parser to be in validating mode.

I believe that jargon about "element only content model" means the document has to be described by a DTD or a schema, at least.
 
Tom Sullivan
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks very much for the help. I'll go get the MS schema and see if that solves the problem.
 
reply
    Bookmark Topic Watch Topic
  • New Topic