This is the FAQ page for the XML and Related Technologies
forum. Contributions are welcome. Also see XmlLinks
Q: The characters() method in my SAX parser doesn't return all the text (or is called more than once). What gives?
Here's what the javadocs of that method say: SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks. William Brogden explains :
The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time. It is up to the programmer to assemble the text properly.
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method. When endElement() occurs I use toString() to get the assembled characters and then work on the logic.
Java Code Examples
Articles and introductions
Specifically about Java
- XML Hammer "is a free and open-source tool that simplifies elementary XML actions like checking for well-formedness, validation, transformation and XPath searches using any JAXP implementation".
- Xerces is a powerful XML parser that is now part of the JRE.
- Crimson is a (now obsolete) XML parser that supports DOM, SAX and JAXP 1.1. It was used in the JRE before the switch to Xerces, and is a useful example for studying the inner workings of an XML parser.
- dom4j, JDOM and XOM are alternative Java DOM APIs.
- Xalan and Saxon are XSL-T processors.
- Apache FOP is an XSL-FO processor that can output numerous formats, including PDF, PS, PCL, AFP, Print, AWT and PNG, and to a lesser extent, RTF and TXT.
- Apache Santuario implements XML Signature and XML Encryption
- JAXB is a Java <--> XML binding library.
- Apache Commons Digester is an XML --> Java mapping library
- NekoHTML, HtmlCleaner, jTidy and TagSoup are libraries that clean up HTML and transform it to XML (thus allowing DOM and SAX to work with them).
- a list of open source XML Diff and Patch tools