wood burning stoves 2.0*
The moose likes Java in General and the fly likes Library to remove HTML tags and comments Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCM Java EE 6 Enterprise Architect Exam Guide this week in the OCMJEA forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Library to remove HTML tags and comments" Watch "Library to remove HTML tags and comments" New topic
Author

Library to remove HTML tags and comments

Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
I am looking for a tool that takes HTML as an input and return text as an output. Basically I want to remove all HTML tags and Comments from my string. Is there any Opensouce library for that?


My blood is tested +ve for Java.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
Parsing random HTML is not simple. Consider the JTidy open source toolkit to get HTML into a parsed form.

Bill
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8877
    
    8

Get the Text in an HTML document using the javax.swing.text.html parser.


"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
Chetan Parekh
Ranch Hand

Joined: Sep 16, 2004
Posts: 3636
Thanks all!!
 
 
subject: Library to remove HTML tags and comments