Can you please explain your last comment about consider sax even if you need the whole document.
I will give it a try. Suppose your XML document represents a collection of books with the data for each one inside a <book> element. Each starting book tag contains some attributes you want to keep and there are additional elements with various bits of data.
We are going to define a book class where each instance represents all the data inside one <book> element so the collection of instances represents the usable data from the document.
In your custom SAX event handler you do this:
1. When a startElement event for "book" occurs, create a new book object, passing the constructor the "Attributes" - keep a reference to the new object as your working object.
2. For each subsequent event, keep track of the current element and/or pass the text data you need to keep to some method in the working book object. (Remember that characters() events may contain only part of the data for a Text node.)
3. When you get the endElement event for "book" that object is complete - add the reference to some collection.
This saves all the object creation that would go into a DOM and lets you skip data you don't need for a particular application.
Bill