File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes Parse a XML file by supplied tag name using SAX Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Parse a XML file by supplied tag name using SAX" Watch "Parse a XML file by supplied tag name using SAX" New topic
Author

Parse a XML file by supplied tag name using SAX

Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
I am trying to write a generic application which will read a very large XML file and grab all the specified elements (node name will be passed as an argument) and it's attribute names and value. The application then would dynamically construct a SQL insert statement and by using JDBC API would do a table load. I have written such an application using DOM which works OK with small XML files. But when comes to a file of over 500 meg or so in size the document parse would take forever. A similar application which is not so generic (as I have the tag names hardcoded) take little over a minute to run. Heard about StAX. So far could'nt find any useful information about StAX.
Well, would really really appreciate if anyone could show me some code snippets, lead, hints etc. explicitly using SAX (looks like do not have any choice?) where I can pass the tag name as an argument value to the executable. In other words, looking for a generic solution.

Thanking you all
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12760
    
    5
Realize that every element start tag will cause a call to your implementation of the startElement() method of your extension of org.xml.sax.helpers.DefaultHandler. (I'm assuming a current Java standard library)
At that point you can look up the name of the element in your list of names and decide what to do with it.
This might involve setting a flag that says "grab all the content from now until the endElement() method for this tag is called" and building your insert statement.
Bill
Sara Tracy
Ranch Hand

Joined: Jan 06, 2006
Posts: 45
I found this example on wrox website

XML file
=========
<?xml version="1.0"?>
<!DOCTYPE train [
<!ELEMENT train (car*)>
<!ELEMENT car (color, weight, length, occupants)>
<!ATTLIST car type CDATA #IMPLIED>
<!ELEMENT color (#PCDATA)>
<!ELEMENT weight (#PCDATA)>
<!ELEMENT length (#PCDATA)>
<!ELEMENT occupants (#PCDATA)>
]>
<train>
<car type="Engine">
<color>Black</color>
<weight>512 tons</weight>
<length>60 feet</length>
<occupants>3</occupants>
</car>
<car type="Baggage">
<color>Green</color>
<weight>80 tons</weight>
<length>40 feet</length>
<occupants>0</occupants>
</car>
<car type="Dining">
<color>Green and Yellow</color>
<weight>50 tons</weight>
<length>50 feet</length>
<occupants>18</occupants>
</car>
<car type="Passenger">
<color>Green and Yellow</color>
<weight>40 tons</weight>
<length>60 feet</length>
<occupants>23</occupants>
</car>
<car type="Pullman">
<color>Green and Yellow</color>
<weight>50 tons</weight>
<length>60 feet</length>
<occupants>23</occupants>
</car>
<car type="Caboose">
<color>Red</color>
<weight>90 tons</weight>
<length>30 feet</length>
<occupants>4</occupants>
</car>
</train>


TrainReader.java
=================
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class TrainReader extends DefaultHandler
{

private boolean isColor;
private String trainCarType = "";
private StringBuffer trainCarColor = new StringBuffer();
private Locator trainLocator = null;

public static void main (String[] args)
throws Exception
{
System.out.println("Running train reader...");
TrainReader readerObj = new TrainReader();
readerObj.read(args[0]);
}

public void read(String fileName)
throws Exception
{
XMLReader reader =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
reader.setContentHandler (this);
reader.setErrorHandler (this);

try
{
reader.setFeature("http://xml.org/sax/features/validation", true);
}
catch (SAXException e)
{
System.err.println("Cannot activate validation");
}

try
{
reader.parse(fileName);
}
catch (SAXException e)
{
System.out.println("Parsing stopped : " + e.getMessage());
}
}

public void setDocumentLocator(Locator loc)
{
trainLocator = loc;
}

public void startDocument()
throws SAXException
{
System.out.println("Start of the train");
}

public void endDocument()
throws SAXException
{
System.out.println("End of the train");
}

public void startElement(String uri, String localName, String qName, Attributes atts)
throws SAXException
{
if (localName.equals("car")) {
if (atts != null) {
trainCarType = atts.getValue("type");
}
}

if (localName.equals("color"))
{
trainCarColor.setLength(0);
isColor = true;
} else
isColor = false;
}

public void characters(char[] ch, int start, int len)
throws SAXException
{
if (isColor)
{
trainCarColor.append(ch, start, len);
}
}

public void endElement(String uri, String localName, String qName)
throws SAXException
{
if (isColor)
{
System.out.println("The color of the " + trainCarType + " car is " +
trainCarColor.toString());
if ((trainCarType.equals("Caboose")) &&
(!trainCarColor.toString().equals("Red")))
{
if (trainLocator != null)
throw new SAXException("The caboose is not red at line " +
trainLocator.getLineNumber() + ", column " +
trainLocator.getColumnNumber() );
else
throw new SAXException("The caboose is not red!");
}
}
isColor = false;
}

public void warning (SAXParseException exception)
throws SAXException {
System.err.println("[Warning] " +
exception.getMessage() + " at line " +
exception.getLineNumber() + ", column " +
exception.getColumnNumber() );
}

public void error (SAXParseException exception)
throws SAXException {
System.err.println("[Error] " +
exception.getMessage() + " at line " +
exception.getLineNumber() + ", column " +
exception.getColumnNumber() );
}

public void fatalError (SAXParseException exception)
throws SAXException {
System.err.println("[Fatal Error] " +
exception.getMessage() + " at line " +
exception.getLineNumber() + ", column " +
exception.getColumnNumber() );
throw exception;
}

}
[ January 26, 2006: Message edited by: Sara James ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parse a XML file by supplied tag name using SAX
 
Similar Threads
Dynamic translation of XML into CSV using XSD
Beginner Seeking Help With XML Parsing
Extract a unique identifier for any given Element
Parse XML using StaX insert into Hsql DB use maven built
what is SAX (event-based) parsing ?