You also need to think about comments -both JavaScript and HTML- as well as the contents of string constants in JavaScripts, both of which can contain lots of stuff that screws up simple regexps.
(From a theoretical point of view, most programming languages are type-1 or type-2 grammars; trying to work with them using weaker type-3 tools -such as regular expressions- will cause complications. See
Chomsky hierarchy for more info.)
If this was my problem, I'd use a library like TagSoup or NekoXNI that creates valid XML from HTML, and then use XML APIs to work the result.