This is the FAQ page for the XML and Related Technologies forum. Contributions are welcome. Also see XmlLinks.

Q: The characters() method in my SAX parser doesn't return all the text (or is called more than once). What gives?

Here's what the javadocs of that method say: SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks. William Brogden explains :

The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time. It is up to the programmer to assemble the text properly.

I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method. When endElement() occurs I use toString() to get the assembled characters and then work on the logic.


Java Code Examples

Articles and introductions


Specifically about Java

  • XML Hammer "is a free and open-source tool that simplifies elementary XML actions like checking for well-formedness, validation, transformation and XPath searches using any JAXP implementation".
  • Xerces is a powerful XML parser that is now part of the JRE.
  • Crimson is a (now obsolete) XML parser that supports DOM, SAX and JAXP 1.1. It was used in the JRE before the switch to Xerces, and is a useful example for studying the inner workings of an XML parser.
  • dom4j, JDOM and XOM are alternative Java DOM APIs.
  • Xalan and Saxon are XSL-T processors.
  • Apache FOP is an XSL-FO processor that can output numerous formats, including PDF, PS, PCL, AFP, Print, AWT and PNG, and to a lesser extent, RTF and TXT.
  • Apache Santuario implements XML Signature and XML Encryption
  • JAXB is a Java <--> XML binding library.
  • Apache Commons Digester is an XML --> Java mapping library
  • NekoHTML, HtmlCleaner and TagSoup are libraries that clean up HTML and transform it to XML (thus allowing DOM and SAX to work with them).
  • a list of open source XML Diff and Patch tools


The formerly available IBM XML exams 141 and 142 have been retired on 12/31/2012. Online certifications are available at and

These exam questions may help you gauge your XML knowledge, even if the associated exam is no longer available:

