Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

trying to remove javascript contents with script tags?

 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to remove the javascript stuff withing the script tags in an html file. I'm having no problem removing the script tags and all the stuff inside. However, i'd like to leave the script tags and just remove the the javascript inside them. I have tried taking group(1) which is the contents and running replace(group(1),"") but it was not working consistently. The matcher.replaceall works very good but i have the darn script tags in my matcher. I thought making the script tags in non captured groups would help so i could then call matcher.replaceall but they are still in match.

Any ideas?



 
Sebastian Janisch
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think there is no need to use regular expressions in this case. Unnecessary overhead.

 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what if there is multiple scripts? in the file this only would get the first occurrence. So, you think using java regex is costly time wise.

 
Sebastian Janisch
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The regex engine is pretty heavy weight, so I tend to avoid it whenever possible.

As for your question, yes it only strips out the first occurance.

But you can simply loop over it until sb.indexOf("<script") is -1.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic