I'm writing a servlet to parse XML and XSL together, and then do some bits an pieces with the resultant HTML.
I'm sure i'm not far off with the regex but i can't quite get it to work. (don't you just hate it when you get to that stage :roll: )
here's an example of the regexpr, and a code snippet that it fails to parse even though i think it should (note there's no CR/LF between the end of the first script tag and the start of the second one)
[ May 26, 2004: Message edited by: Richard Hands ]
[ May 27, 2004: Message edited by: Max Habibi ] [ May 27, 2004: Message edited by: Max Habibi ]
Use the Pattern.DOTALL flag instead of Pattern.MULTILINE. The latter causes the start and end anchors (^ and $) to match at line boundaries as well as at the beginning and end of the input. DOTALL allows the dot to match line terminator characters (\r, \n, etc.), which it doesn't normally do.
basically, i have an abstract class that instantiates an xml transformer, reads the basic data-centric xml from the database and transforms it with a html-centric xsl stylesheet (which xsl it uses is determined by various bits of data in the database).
If all you want to do is remove the JS then do it at transform time - its far easier to write a template which matches <script /> elements and replaces them with nothing than to transform XML in to HTML and then strip out the script tags. Basically you are parsing the document twice and exposing your app to more possible sources of error - both in the transform and in the regular expression matching. If you get the document after it has initially been transformed and can't change part of the application, remember that you can make the result of an xslt transformation valid XHTML, in which case you could quite easily transform it again with a simple XSLT.
Joined: Feb 25, 2004
a very interesting point, and one i will go and try out.