posted 14 years ago
There's no natural mapping from an HTML page to XML, so you'll need to code that yourself. I'd approach this using a library like HtmlUnit that makes it easy to access a web site programmatically. It cleans the HTML so it becomes well-formed XML, and then presents a DOM and XPath interface that you can use to extract whichever parts of the page you're interested in.