aspose file tools
The moose likes Java in General and the fly likes remove javascript from html web page Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Software Craftsman this week in the Agile forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "remove javascript from html web page" Watch "remove javascript from html web page" New topic
Author

remove javascript from html web page

asit dhal
Greenhorn

Joined: May 05, 2009
Posts: 13

I need to remove all tags(html tags and javascript code) from a web page.

Can somebody tell me how to do this ?


http://kodeyard.blogspot.com/
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8626
    
  23

asit dhal wrote:I need to remove all tags(html tags and javascript code) from a web page.

Can somebody tell me how to do this ?

I suggest you look at a parser for SAX or DOM. Java has implementations for both. The first is generally easier to use, and I'm pretty sure it will do what you want; however you may need to convert the HTML to XHTML first. For that, there is a utility called JTidy, which I believe has it's own SAX-like parser built-in; but I've never used it, so have no idea how easy it is.

Tip: DON'T think about a regex-based solution if there is any "awareness" required. They are very powerful, but not well-suited to hierarchical logic.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
 
Have you tried LearnNowOnline? http://www.learnnowonline.com/
 
subject: remove javascript from html web page