It looks like the contents of that CDATA section is a fragment of HTML. Not an HTML document, but a bunch of HTML tags. So the first step is to get the contents of the CDATA section into a string (using your XML parser). The second step is to parse that String using an HTML parser -- no XML parser will be able to deal with that. Make sure you choose an HTML parser which is capable of dealing with "tag soup".