*
The moose likes XML and Related Technologies and the fly likes About  parsing with DOM ... Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "About  parsing with DOM ..." Watch "About  parsing with DOM ..." New topic
Author

About parsing with DOM ...

Juhan Voolaid
Ranch Hand

Joined: Nov 18, 2003
Posts: 179
Hi

I'm tryng to parse xml file with dom. So far everithing seems ok, but i get punch of errors to my console that, my elements and attributes are not declared in my xml file:

the "10" at final row is what i was looking ... you will see it in my code.

First my xml file, which is without dtd or other file, just one xml file:

Notice there are 10 "score" elements with id from 0 to 9.

Here is my Java code:


so how to get rid of those errors?

[ August 12, 2004: Message edited by: Juhan Voolaid ]
[ August 14, 2004: Message edited by: Juhan Voolaid ]
Vladas Razas
Ranch Hand

Joined: Dec 02, 2003
Posts: 385
You use parser with validation but you didn't provide XML schema to validate your document against. I think this is the problem. Turn validation off or make a schema.

best regards
Juhan Voolaid
Ranch Hand

Joined: Nov 18, 2003
Posts: 179
gee ... didn't know that was so simple, but here is another problem with this example. I thought taht is becouse of those declaration errors but they are not.

With the same example, the function getInfo(Element scores):
Element "scores" is the root element and i make NodeList of "score" elements.
Each "score" has nodes called "name" and "points".
I don't know why i can't get access to "name" and "points", But the root element "score" works fine.
I'll show you what i mean:

But


I don't know what is wrong
Vladas Razas
Ranch Hand

Joined: Dec 02, 2003
Posts: 385
There are 2 problems. The first problem is that in your sample code you get Name and Value of the first node. So that would be "Name" and "Jux" (from your xml sample above). For the second node name use item(1).

Second problem is tougher. XML was ment for documents. And DOM is not the easiest thing to work with (even Sun site would recommend you JDOM, unless you want all DOM flexibility). Let's take XML sample:

<dog> <name/> </dog>

You would think that DOM would create you element Dog with node Name. But it will create you 5 elements:
<dog>
<#text> </#text>
<name/>
<#text> </#text>
</dog>

Parser does this so you wouldn't lose your whitespace characters. Possible solution would be write something that would remove you all <#text> elements that consist only of whitespace. The second way is to get factory which would create parser that will to that for you. Look DocumentBuilderFactory.setIgnoringElementContentWhitespace(). But there is a problem for this to work (look Javadoc) you have to get validating parser (and again for this you will have to have schema). Also you may want to look at DocumentBuilderFactory.setIgnoringComments() (in case user will want to write comments in your XML.

Well, Sun recommends to use DOM only if you want to deal with all this. Otherwise they say you can use JDOM for simplicity. But that would add 1-2 mb to your runtime.

best regards

P.S. I wrote my whitespace remover. Haven't tried that validating parser way yet.

Here is my remover. Give it document root node.

Vladas Razas
Ranch Hand

Joined: Dec 02, 2003
Posts: 385
Sorry,

<dog>
<#text> </#text>
<name/>
<#text> </#text>
</dog>

is not 5 elements. But you've got the idea
Tom Passin
author
Ranch Hand

Joined: Aug 08, 2004
Posts: 30
You don't need to write any code. DOM includes the method normalize(), which combines all the child text nodes of an element into a single text node. Then your attempts to access those elements will work as expected.


Author of <a href="http://www.amazon.com/exec/obidos/ASIN/1932394206/ref=jranch-20" target="_blank" rel="nofollow">Explorer's Guide to the Semantic Web</a>
Vladas Razas
Ranch Hand

Joined: Dec 02, 2003
Posts: 385
Does it remove whitespace?
Vladas Razas
Ranch Hand

Joined: Dec 02, 2003
Posts: 385
I've tried normalize(). Didn't work for me. I didn't 100% understand what it does from javadoc. I understood it joins adjacent #text nodes, but does it eliminate them completely (those that consist of whitespace only). I've tried it on my xml and still got #text between elements, like:

#text = "\n "
<elem1/>
#text = "\n "

Can you explain?

Thanks!
Tom Passin
author
Ranch Hand

Joined: Aug 08, 2004
Posts: 30
The parser has no way to know whether white space between elements is significant or not, unless there is a dtd or schema to tell it (and the parser is instructed to use it). Therefore in most cases the whitespace nodes are kept. Some parsers have parameters that can change how that kind of whitespace is handled, so if you are interested, read up on the docs for your parser.
Juhan Voolaid
Ranch Hand

Joined: Nov 18, 2003
Posts: 179
I allso didn't get to wark that normalize() method and I still have difficulties with parsing my xml file. I allso added DTD to my XML file.
I get the same problems by doing so:

And the problem is simply that after the parsing process my info array consists of empty strings.

Here is again my xml file "scores.xml":

document type definition "scores.dtd":

and this is how i make dom document in Java:


I don't know what should i do. Seems to me that the finalize() method still doesen't work. I think i have probles with those white_space elements.

Please help.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: About parsing with DOM ...
 
Similar Threads
Deploying webapp
FOP and SVG Files
How to transform DTD with JAXP
JavaScript as Element value
Snake movement