This week's book giveaway is in the Java in General forum.
We're giving away four copies of Think Java: How to Think Like a Computer Scientist and have Allen B. Downey & Chris Mayfield on-line!
See this thread for details.
Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

normalize( ) in DOM

 
Jayadev Pulaparty
Ranch Hand
Posts: 662
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JAXP1.2 says about normalize() method of the Node interface as follows -
Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.
You can find the complete explanation in this page -
http://java.sun.com/j2se/1.4.1/docs/api/org/w3c/dom/Node.html
Let us look at a simple example -
<root>
<!-- DOM INSERTS AN EMPTY TEXT NODE HERE -->
<child/>
<!-- DOM INSERTS AN EMPTY TEXT NODE HERE -->
</root>
Hence in DOM, the DocumentElement root node will have 3 children in total. Is something like rootNode.normalize() going to get rid of the empty text nodes here?? and then the root node is going to have only one child node <child/> ??
I tried it on a simple example, but doesn't seem to work that way. I'm not sure if my understanding of the method explanation is proper or not.
Any one please explain.
Thanks.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That is my understanding of normalize - what exactly did you try and what happened?
Bill
 
Jayadev Pulaparty
Ranch Hand
Posts: 662
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did a root.normalize() on the root node (document element node) and expected all the empty text nodes to go away. I then did a recurseNodes(root) which basically walks thru the tree and prints all the nodes with proper indentations. I could see the empty text() nodes still hanging in there.
=================================================
I'm pasting the code here just in case you are interested in seeing the output...
JAVA CODE .................
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import java.io.File;
import org.w3c.dom.*;
public class test{
public static int depth;
// This method prints attributes of the supplied element
public static void showAttributes(Node element)
{
NamedNodeMap nnMap = element.getAttributes();
Node indexedNode;
for(int i=0; i<nnMap.getLength(); i++)
{
for(int j=0; j<depth; j++) System.out.print(" ");
indexedNode = nnMap.item(i);
System.out.println("ATTR::"+indexedNode.getNodeName()+"="+

indexedNode.getNodeValue()+"("+getNodeType(indexedNode)+")");
}
}
// This method returns the node type
public static String getNodeType(Node node)
{
short type;
type = node.getNodeType();
String retNode = "UNKNOWN";
if(type == node.ELEMENT_NODE) retNode="ELEMENT_NODE";
if(type == node.ATTRIBUTE_NODE) retNode="ATTRIBUTE_NODE";
if(type == node.COMMENT_NODE) retNode="COMMENT_NODE";
if(type == node.PROCESSING_INSTRUCTION_NODE)
retNode="PROCESSING_INSTRUCTION_NODE";
if(type == node.ENTITY_REFERENCE_NODE) retNode="ENTITY_REFERENCE_NODE";
if(type == node.ENTITY_NODE) retNode="ENTITY_NODE";
if(type == node.NOTATION_NODE) retNode="NOTATION_NODE";

if(type == node.TEXT_NODE) retNode="TEXT_NODE";
if(type == node.DOCUMENT_NODE) retNode="DOCUMENT_NODE";
if(type == node.DOCUMENT_TYPE_NODE) retNode="DOCUMENT_TYPE_NODE";

if(type == node.CDATA_SECTION_NODE) retNode="CDATA_SECTION_NODE";
if(type == node.DOCUMENT_FRAGMENT_NODE)
retNode="DOCUMENT_FRAGMENT_NODE";
return retNode;
}
// Recursive routine to print the tree
public static void recurseNodes(Node node)
{

for(int i=0;i<depth; i++) System.out.print(" ");
// print the node information
if(node.getNodeType() == node.TEXT_NODE)
{
if(node.getNodeValue().trim().length() > 0){
System.out.println(node.getNodeName()+"="+
node.getNodeValue()+"("+getNodeType(node)+")");
}
else
{

System.out.println(node.getNodeName()+"="+"("+getNodeType(node)+")");
}
}
else
{
System.out.println(node.getNodeName()+"="+
node.getNodeValue()+"("+getNodeType(node)+")");
}
// print the node attributes
if(node.getNodeType() == node.ELEMENT_NODE)
showAttributes(node);
// recurse thru the children
Node firstChild = node.getFirstChild();
while(firstChild!=null)
{
depth ++;
recurseNodes(firstChild);
firstChild = firstChild.getNextSibling();
}
depth--;
}

public static void main (String args[]) {
File docFile = new File("aa.xml");
Document doc = null;
NamedNodeMap nnMap = null;
int i;
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
dbf.setExpandEntityReferences(false);
DocumentBuilder docb = dbf.newDocumentBuilder();
doc = docb.parse(docFile);
Element root = doc.getDocumentElement();
depth=0;
root.normalize();
recurseNodes(root);
}
catch (DOMException de)
{
System.out.println("DOM Exception ::"+de.toString());
}
catch (Exception e)
{
System.out.println("EXCEPTION::"+e.toString());
System.out.println(e.getClass());
}
}
}
XML FILE BEING PARSED............
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE orders [
<!ELEMENT orders (order*)>
<!ELEMENT order (customerid, status, item*)>
<!ELEMENT item ANY>
<!ELEMENT status (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT qty (#PCDATA)>
<!ATTLIST customerid limit CDATA #REQUIRED>
<!ATTLIST orders Notation ENTITY #IMPLIED>
<!ATTLIST item instock CDATA #REQUIRED
itemid CDATA #REQUIRED>
<!ENTITY jaya "WB78">
<!ENTITY jayaTwo "&elementEntity;">
<!ELEMENT jaya (#PCDATA)>
<!ENTITY elementEntity "<jaya>jayadev</jaya>">
<!NOTATION jaya SYSTEM "jaya.exe">
<!ENTITY giffile SYSTEM "demo.gif" NDATA jaya>
]>
<?PI-1 this is first processing instruction?>
<orders Notation="giffile">
<order>
<?PI-2 this is second processing instruction?>
<customerid limit="1000">12341</customerid>
<status>pending</status>
<item instock="Y" itemid="SA15">
<![CDATA[hi there]]>
<name>Silver Show Saddle, 16 inch</name>
<price>825.00</price>
<qty>1</qty>
</item>
<item instock="N" itemid="C49">
<![CDATA[hi there]]>
<name>Premium Cinch</name>
<price>49.00</price>
<qty>1</qty>
</item>
</order>
<order>
<?PI-3 this is third processing instruction?>
<customerid limit="150">251222</customerid>
<status>pending</status>
<item instock="Y" itemid="&jaya;">
&jayaTwo;
<!-- &giffile; -->
<![CDATA[hi there]]>
<name>Winter Blanket (78 inch) &jaya;</name>
<price>20</price>
<qty>10</qty>
</item>
</order>
</orders>
 
Dan Drillich
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jayadev,
Please add the following DD code segment to your main method -

Please run it once with the normalize method and once without. You'll see that the adjacent text nodes were combined.
Now, I'm not sure about stand-alone empty text nodes.
Cheers,
Dan
 
Jayadev Pulaparty
Ranch Hand
Posts: 662
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Dan. I ran the stuff. I could see the adjacent text nodes being collapsed in to a single text node. But my only other remaining concern is that i also expect the empty text nodes to go away.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Looking at the DocumentBuilderFactory API, I see a method setIgnoringElementContentWhitespace( boolean flag )
That might be what you want. I recall that earlier parsers had some sort of flag for dropping "ignorable whitespace" but I have not messed with the above.
Bill
 
Jayadev Pulaparty
Ranch Hand
Posts: 662
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the info. I tried that but the empty wrapper text nodes still hang in there.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic