File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes Parsing XML elements Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Parsing XML elements" Watch "Parsing XML elements" New topic

Parsing XML elements

Melanie Walsh

Joined: Dec 14, 2004
Posts: 20
I have a class that implements ContentHandler, the character method looks for tags then extracts the content.
It all works well except when it finds a '[' or ']' in the content, in this case it returns just a ']', can anyone tell me why, is this a special character.
this is the code

Would be very grateful on any suggestions

Fixed the Code tags. UBB tags use [ ] not the angle brakets like in XML format. Now, you really hate those [ and ] brakets, don't you!
[ April 21, 2005: Message edited by: Madhav Lakkapragada ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Sorry, I don't know about the square brackets. There are a handful of characters that choke the parser, but I haven't had trouble with any disappearing.

I want to invite you to look at another potential issue, though. The characters() method is not guaranteed to give you all the contents of a tag in one shot. You can imagine the parser buffering input somewhere under the covers - it might even be true. If the end of one buffer comes in the middle of a tag, it might call you with the characters it has so far, read up the next buffer and call you again with the rest of the characters in the tag. I learned this by being burned by a parser in another language that worked in 2048 byte chunks.

My solution has been to have the characters() method append what it gets to a member variable string, and to use the string in the endElement method instead. Seems to work so long as you don't have nested tags in the middle of the text like a bold word in the middle of an HTML paragraph.

Let me know what you learn on the square brackets!

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Ilja Preuss

Joined: Jul 11, 2001
Posts: 14112
Moving to XML forum...

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Madhav Lakkapragada
Ranch Hand

Joined: Jun 03, 2000
Posts: 5040
Most probably a parser issue, I am lead to believe.
I tried a simple example with the standard program from the SAx tutorials and it did print the square brakets as text, without missing or truncating it.

Try to echo you input using this code and see if it works.
When I ran a sample, I used the J2SE 1.4 parsers.

This source code is from the SAX tutorials (courtesy

This code also illustrates what Stan has suggested in his post.

- m

Take a Minute, Donate an Hour, Change a Life
I agree. Here's the link:
subject: Parsing XML elements
jQuery in Action, 3rd edition