File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Beginning Java and the fly likes Is there an api for ignoring xml? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Is there an api for ignoring xml?" Watch "Is there an api for ignoring xml?" New topic

Is there an api for ignoring xml?

Anthony Smith
Ranch Hand

Joined: Sep 10, 2001
Posts: 285
I have some code that reads a file. However this file has xml/html tags in it and I would like for the program to ignore that text. Is there anything like that already available? Otherwise I guess I need to write something that ignores tags...
Jared Sprague

Joined: Jun 16, 2004
Posts: 16
I'm not sure if there is something already out there that will do what you want, but it would be really easy to use java.util.regex to create some simple regular expressions to remove tags from a file.
Tim West
Ranch Hand

Joined: Mar 15, 2004
Posts: 539
An alternative (but slightly more complex) solution...

If you're reading actual XML, it's quite easy to create a SAX parser for this. The basic gist would be "ignore all events except for CDATA events". Then, do what you want with the CDATA. The catches:
  • You have to learn SAX, which is pretty simple but still takes time.
  • It won't work with HTML that isn't XML-compliant. XML parsers are uber-strict, of course.

  • I did this once but I've lost the source, or I'd help ya out. As Jared says, a regexp solution will be easier - ignore everything between < and > the choice is yours!

    Dirk Schreckmann

    Joined: Dec 10, 2001
    Posts: 7023
    Moving this to the Intermediate forum...
    [ June 30, 2004: Message edited by: Dirk Schreckmann ]

    [How To Ask Good Questions] [JavaRanch FAQ Wiki] [JavaRanch Radio]
    I agree. Here's the link:
    subject: Is there an api for ignoring xml?
    It's not a secret anymore!