wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes normalize( ) in DOM Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "normalize( ) in DOM" Watch "normalize( ) in DOM" New topic
Author

normalize( ) in DOM

Jayadev Pulaparty
Ranch Hand

Joined: Mar 25, 2002
Posts: 662
JAXP1.2 says about normalize() method of the Node interface as follows -
Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.
You can find the complete explanation in this page -
http://java.sun.com/j2se/1.4.1/docs/api/org/w3c/dom/Node.html
Let us look at a simple example -
<root>
<!-- DOM INSERTS AN EMPTY TEXT NODE HERE -->
<child/>
<!-- DOM INSERTS AN EMPTY TEXT NODE HERE -->
</root>
Hence in DOM, the DocumentElement root node will have 3 children in total. Is something like rootNode.normalize() going to get rid of the empty text nodes here?? and then the root node is going to have only one child node <child/> ??
I tried it on a simple example, but doesn't seem to work that way. I'm not sure if my understanding of the method explanation is proper or not.
Any one please explain.
Thanks.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
That is my understanding of normalize - what exactly did you try and what happened?
Bill
Jayadev Pulaparty
Ranch Hand

Joined: Mar 25, 2002
Posts: 662
I did a root.normalize() on the root node (document element node) and expected all the empty text nodes to go away. I then did a recurseNodes(root) which basically walks thru the tree and prints all the nodes with proper indentations. I could see the empty text() nodes still hanging in there.
=================================================
I'm pasting the code here just in case you are interested in seeing the output...
JAVA CODE .................
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import java.io.File;
import org.w3c.dom.*;
public class test{
public static int depth;
// This method prints attributes of the supplied element
public static void showAttributes(Node element)
{
NamedNodeMap nnMap = element.getAttributes();
Node indexedNode;
for(int i=0; i<nnMap.getLength(); i++)
{
for(int j=0; j<depth; j++) System.out.print(" ");
indexedNode = nnMap.item(i);
System.out.println("ATTR::"+indexedNode.getNodeName()+"="+

indexedNode.getNodeValue()+"("+getNodeType(indexedNode)+")");
}
}
// This method returns the node type
public static String getNodeType(Node node)
{
short type;
type = node.getNodeType();
String retNode = "UNKNOWN";
if(type == node.ELEMENT_NODE) retNode="ELEMENT_NODE";
if(type == node.ATTRIBUTE_NODE) retNode="ATTRIBUTE_NODE";
if(type == node.COMMENT_NODE) retNode="COMMENT_NODE";
if(type == node.PROCESSING_INSTRUCTION_NODE)
retNode="PROCESSING_INSTRUCTION_NODE";
if(type == node.ENTITY_REFERENCE_NODE) retNode="ENTITY_REFERENCE_NODE";
if(type == node.ENTITY_NODE) retNode="ENTITY_NODE";
if(type == node.NOTATION_NODE) retNode="NOTATION_NODE";

if(type == node.TEXT_NODE) retNode="TEXT_NODE";
if(type == node.DOCUMENT_NODE) retNode="DOCUMENT_NODE";
if(type == node.DOCUMENT_TYPE_NODE) retNode="DOCUMENT_TYPE_NODE";

if(type == node.CDATA_SECTION_NODE) retNode="CDATA_SECTION_NODE";
if(type == node.DOCUMENT_FRAGMENT_NODE)
retNode="DOCUMENT_FRAGMENT_NODE";
return retNode;
}
// Recursive routine to print the tree
public static void recurseNodes(Node node)
{

for(int i=0;i<depth; i++) System.out.print(" ");
// print the node information
if(node.getNodeType() == node.TEXT_NODE)
{
if(node.getNodeValue().trim().length() > 0){
System.out.println(node.getNodeName()+"="+
node.getNodeValue()+"("+getNodeType(node)+")");
}
else
{

System.out.println(node.getNodeName()+"="+"("+getNodeType(node)+")");
}
}
else
{
System.out.println(node.getNodeName()+"="+
node.getNodeValue()+"("+getNodeType(node)+")");
}
// print the node attributes
if(node.getNodeType() == node.ELEMENT_NODE)
showAttributes(node);
// recurse thru the children
Node firstChild = node.getFirstChild();
while(firstChild!=null)
{
depth ++;
recurseNodes(firstChild);
firstChild = firstChild.getNextSibling();
}
depth--;
}

public static void main (String args[]) {
File docFile = new File("aa.xml");
Document doc = null;
NamedNodeMap nnMap = null;
int i;
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
dbf.setExpandEntityReferences(false);
DocumentBuilder docb = dbf.newDocumentBuilder();
doc = docb.parse(docFile);
Element root = doc.getDocumentElement();
depth=0;
root.normalize();
recurseNodes(root);
}
catch (DOMException de)
{
System.out.println("DOM Exception ::"+de.toString());
}
catch (Exception e)
{
System.out.println("EXCEPTION::"+e.toString());
System.out.println(e.getClass());
}
}
}
XML FILE BEING PARSED............
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE orders [
<!ELEMENT orders (order*)>
<!ELEMENT order (customerid, status, item*)>
<!ELEMENT item ANY>
<!ELEMENT status (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT qty (#PCDATA)>
<!ATTLIST customerid limit CDATA #REQUIRED>
<!ATTLIST orders Notation ENTITY #IMPLIED>
<!ATTLIST item instock CDATA #REQUIRED
itemid CDATA #REQUIRED>
<!ENTITY jaya "WB78">
<!ENTITY jayaTwo "&elementEntity;">
<!ELEMENT jaya (#PCDATA)>
<!ENTITY elementEntity "<jaya>jayadev</jaya>">
<!NOTATION jaya SYSTEM "jaya.exe">
<!ENTITY giffile SYSTEM "demo.gif" NDATA jaya>
]>
<?PI-1 this is first processing instruction?>
<orders Notation="giffile">
<order>
<?PI-2 this is second processing instruction?>
<customerid limit="1000">12341</customerid>
<status>pending</status>
<item instock="Y" itemid="SA15">
<![CDATA[hi there]]>
<name>Silver Show Saddle, 16 inch</name>
<price>825.00</price>
<qty>1</qty>
</item>
<item instock="N" itemid="C49">
<![CDATA[hi there]]>
<name>Premium Cinch</name>
<price>49.00</price>
<qty>1</qty>
</item>
</order>
<order>
<?PI-3 this is third processing instruction?>
<customerid limit="150">251222</customerid>
<status>pending</status>
<item instock="Y" itemid="&jaya;">
&jayaTwo;
<!-- &giffile; -->
<![CDATA[hi there]]>
<name>Winter Blanket (78 inch) &jaya;</name>
<price>20</price>
<qty>10</qty>
</item>
</order>
</orders>
Dan Drillich
Ranch Hand

Joined: Jul 09, 2001
Posts: 1167
Jayadev,
Please add the following DD code segment to your main method -

Please run it once with the normalize method and once without. You'll see that the adjacent text nodes were combined.
Now, I'm not sure about stand-alone empty text nodes.
Cheers,
Dan


William Butler Yeats: All life is a preparation for something that probably will never happen. Unless you make it happen.
Jayadev Pulaparty
Ranch Hand

Joined: Mar 25, 2002
Posts: 662
Thanks Dan. I ran the stuff. I could see the adjacent text nodes being collapsed in to a single text node. But my only other remaining concern is that i also expect the empty text nodes to go away.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
Looking at the DocumentBuilderFactory API, I see a method setIgnoringElementContentWhitespace( boolean flag )
That might be what you want. I recall that earlier parsers had some sort of flag for dropping "ignorable whitespace" but I have not messed with the above.
Bill
Jayadev Pulaparty
Ranch Hand

Joined: Mar 25, 2002
Posts: 662
Thanks for the info. I tried that but the empty wrapper text nodes still hang in there.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: normalize( ) in DOM
 
Similar Threads
WSDL2Java Question
Fresher ! need help
Trouble reading back an XML buffer
find position of a given node
problem simply stated