• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Parsing XML elements

 
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a class that implements ContentHandler, the character method looks for tags then extracts the content.
It all works well except when it finds a '[' or ']' in the content, in this case it returns just a ']', can anyone tell me why, is this a special character.
this is the code


Would be very grateful on any suggestions

ps:
Fixed the Code tags. UBB tags use [ ] not the angle brakets like in XML format. Now, you really hate those [ and ] brakets, don't you!
[ April 21, 2005: Message edited by: Madhav Lakkapragada ]
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry, I don't know about the square brackets. There are a handful of characters that choke the parser, but I haven't had trouble with any disappearing.

I want to invite you to look at another potential issue, though. The characters() method is not guaranteed to give you all the contents of a tag in one shot. You can imagine the parser buffering input somewhere under the covers - it might even be true. If the end of one buffer comes in the middle of a tag, it might call you with the characters it has so far, read up the next buffer and call you again with the rest of the characters in the tag. I learned this by being burned by a parser in another language that worked in 2048 byte chunks.

My solution has been to have the characters() method append what it gets to a member variable string, and to use the string in the endElement method instead. Seems to work so long as you don't have nested tags in the middle of the text like a bold word in the middle of an HTML paragraph.

Let me know what you learn on the square brackets!
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Moving to XML forum...
 
Ranch Hand
Posts: 5040
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Most probably a parser issue, I am lead to believe.
I tried a simple example with the standard Echo.java program from the SAx tutorials and it did print the square brakets as text, without missing or truncating it.

Try to echo you input using this code and see if it works.
When I ran a sample, I used the J2SE 1.4 parsers.

This source code is from the SAX tutorials (courtesy java.sun.com).


This code also illustrates what Stan has suggested in his post.
Regards.

- m
 
reply
    Bookmark Topic Watch Topic
  • New Topic