This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes Library to remove HTML tags and comments Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Library to remove HTML tags and comments" Watch "Library to remove HTML tags and comments" New topic
Author

Library to remove HTML tags and comments

Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
I am looking for a tool that takes HTML as an input and return text as an output. Basically I want to remove all HTML tags and Comments from my string. Is there any Opensouce library for that?


My blood is tested +ve for Java.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12756
    
    5
Parsing random HTML is not simple. Consider the JTidy open source toolkit to get HTML into a parsed form.

Bill
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8834
    
    7

Get the Text in an HTML document using the javax.swing.text.html parser.


"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Thanks all!!
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Library to remove HTML tags and comments
 
Similar Threads
how to remove comments from html ?
How can I save htmldocument with and without tags?
HTML tags
HTML sanitization
List question