Can somebody tell me how to do this ?
I suggest you look at a parser for SAX or DOM. Java has implementations for both. The first is generally easier to use, and I'm pretty sure it will do what you want; however you may need to convert the HTML to XHTML first. For that, there is a utility called JTidy, which I believe has it's own SAX-like parser built-in; but I've never used it, so have no idea how easy it is.
Tip: DON'T think about a regex-based solution if there is any "awareness" required. They are very powerful, but not well-suited to hierarchical logic.
Bats fly at night, 'cause they aren't we. And if we tried, we'd hit a tree -- Ogden Nash (or should've been).
Articles by Winston can be found here