This week's book giveaway is in the Java in General forum.
We're giving away four copies of Think Java: How to Think Like a Computer Scientist and have Allen B. Downey & Chris Mayfield on-line!
See this thread for details.
Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Library to remove HTML tags and comments

 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am looking for a tool that takes HTML as an input and return text as an output. Basically I want to remove all HTML tags and Comments from my string. Is there any Opensouce library for that?
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Parsing random HTML is not simple. Consider the JTidy open source toolkit to get HTML into a parsed form.

Bill
 
Joe Ess
Bartender
Posts: 9279
10
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Get the Text in an HTML document using the javax.swing.text.html parser.
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all!!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic