aspose file tools
The moose likes Java in General and the fly likes remove javascript from html web page Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "remove javascript from html web page" Watch "remove javascript from html web page" New topic
Author

remove javascript from html web page

asit dhal
Greenhorn

Joined: May 05, 2009
Posts: 13

I need to remove all tags(html tags and javascript code) from a web page.

Can somebody tell me how to do this ?


http://kodeyard.blogspot.com/
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 4726
    
    7

asit dhal wrote:I need to remove all tags(html tags and javascript code) from a web page.

Can somebody tell me how to do this ?

I suggest you look at a parser for SAX or DOM. Java has implementations for both. The first is generally easier to use, and I'm pretty sure it will do what you want; however you may need to convert the HTML to XHTML first. For that, there is a utility called JTidy, which I believe has it's own SAX-like parser built-in; but I've never used it, so have no idea how easy it is.

Tip: DON'T think about a regex-based solution if there is any "awareness" required. They are very powerful, but not well-suited to hierarchical logic.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: remove javascript from html web page
 
Similar Threads
can we able to create struts html tags dynamically from javascript?
How code that is commented work?
how to remove the footer and header from a web page while printing
How i read,remove html script tags,content ?
JSP Custom Tag Vs AJAX JSP tags