File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes remove javascript from html web page Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "remove javascript from html web page" Watch "remove javascript from html web page" New topic
Author

remove javascript from html web page

asit dhal
Greenhorn

Joined: May 05, 2009
Posts: 13

I need to remove all tags(html tags and javascript code) from a web page.

Can somebody tell me how to do this ?


http://kodeyard.blogspot.com/
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7059
    
  16

asit dhal wrote:I need to remove all tags(html tags and javascript code) from a web page.

Can somebody tell me how to do this ?

I suggest you look at a parser for SAX or DOM. Java has implementations for both. The first is generally easier to use, and I'm pretty sure it will do what you want; however you may need to convert the HTML to XHTML first. For that, there is a utility called JTidy, which I believe has it's own SAX-like parser built-in; but I've never used it, so have no idea how easy it is.

Tip: DON'T think about a regex-based solution if there is any "awareness" required. They are very powerful, but not well-suited to hierarchical logic.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: remove javascript from html web page
 
Similar Threads
can we able to create struts html tags dynamically from javascript?
JSP Custom Tag Vs AJAX JSP tags
How code that is commented work?
How i read,remove html script tags,content ?
how to remove the footer and header from a web page while printing