JavaRanch Home    
 
This page:         last edited 23 December 2012         What's Changed?         Edit

XML FAQ   

This is the FAQ page for the XML and Related Technologies forum. Contributions are welcome. Also see XmlLinks.

Q: The characters() method in my SAX parser doesn't return all the text (or is called more than once). What gives?

Here's what the javadocs of that method say: SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks. William Brogden explains :

The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time. It is up to the programmer to assemble the text properly.

I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method. When endElement() occurs I use toString() to get the assembled characters and then work on the logic.

JavaDoc:org.xml.sax.ContentHandler


Java Code Examples

Various XML-handling code examples can be found on the Example Depot site (see javax.xml.parsers, javax.xml.transform, javax.xml.transform.sax, org.w3c.dom and org.xml.sax at the bottom of the page).


Articles and introductions

General

Specifically about Java



Software
  • XML Hammer "is a free and open-source tool that simplifies elementary XML actions like checking for well-formedness, validation, transformation and XPath searches using any JAXP implementation".
  • Xerces is a powerful XML parser that is now part of the JRE.
  • Crimson is a (now obsolete) XML parser that supports DOM, SAX and JAXP 1.1. It was used in the JRE before the switch to Xerces, and is a useful example for studying the inner workings of an XML parser.
  • dom4j, JDOM and XOM are alternative Java DOM APIs.
  • Xalan and Saxon are XSL-T processors.
  • Apache FOP is an XSL-FO processor that can output numerous formats, including PDF, PS, PCL, AFP, Print, AWT and PNG, and to a lesser extent, RTF and TXT.
  • Apache Santuario implements XML Signature and XML Encryption
  • JAXB is a Java <--> XML binding library.
  • Apache Commons Digester is an XML --> Java mapping library
  • NekoHTML, HtmlCleaner, jTidy and TagSoup are libraries that clean up HTML and transform it to XML (thus allowing DOM and SAX to work with them).
  • a list of open source XML Diff and Patch tools


CategoryFaq XmlLinks

JavaRanchContact us — Copyright © 1998-2013 Paul Wheaton