aspose file tools*
The moose likes Java in General and the fly likes trying to remove javascript contents with script tags? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "trying to remove javascript contents with script tags?" Watch "trying to remove javascript contents with script tags?" New topic
Author

trying to remove javascript contents with script tags?

steve labar
Ranch Hand

Joined: Sep 10, 2008
Posts: 55
I'm trying to remove the javascript stuff withing the script tags in an html file. I'm having no problem removing the script tags and all the stuff inside. However, i'd like to leave the script tags and just remove the the javascript inside them. I have tried taking group(1) which is the contents and running replace(group(1),"") but it was not working consistently. The matcher.replaceall works very good but i have the darn script tags in my matcher. I thought making the script tags in non captured groups would help so i could then call matcher.replaceall but they are still in match.

Any ideas?



Sebastian Janisch
Ranch Hand

Joined: Feb 23, 2009
Posts: 1183
I think there is no need to use regular expressions in this case. Unnecessary overhead.



JDBCSupport - An easy to use, light-weight JDBC framework -
steve labar
Ranch Hand

Joined: Sep 10, 2008
Posts: 55
what if there is multiple scripts? in the file this only would get the first occurrence. So, you think using java regex is costly time wise.

Sebastian Janisch
Ranch Hand

Joined: Feb 23, 2009
Posts: 1183
The regex engine is pretty heavy weight, so I tend to avoid it whenever possible.

As for your question, yes it only strips out the first occurance.

But you can simply loop over it until sb.indexOf("<script") is -1.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: trying to remove javascript contents with script tags?