• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Is there an api for ignoring xml?

 
Ranch Hand
Posts: 285
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have some code that reads a file. However this file has xml/html tags in it and I would like for the program to ignore that text. Is there anything like that already available? Otherwise I guess I need to write something that ignores tags...
 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not sure if there is something already out there that will do what you want, but it would be really easy to use java.util.regex to create some simple regular expressions to remove tags from a file.
 
Ranch Hand
Posts: 539
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
An alternative (but slightly more complex) solution...

If you're reading actual XML, it's quite easy to create a SAX parser for this. The basic gist would be "ignore all events except for CDATA events". Then, do what you want with the CDATA. The catches:
  • You have to learn SAX, which is pretty simple but still takes time.
  • It won't work with HTML that isn't XML-compliant. XML parsers are uber-strict, of course.


  • I did this once but I've lost the source, or I'd help ya out. As Jared says, a regexp solution will be easier - ignore everything between < and > the choice is yours!


    --Tim
     
    Sheriff
    Posts: 7023
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Moving this to the Intermediate forum...
     
    reply
      Bookmark Topic Watch Topic
    • New Topic