This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
I might use jWebUnit for making sense of HTML. It puts a nice API on top of the page that's easier to use than dealing with XML. Don't be put off that it's billed as a testing tool - using it to access HTML pages works just fine. Actually, I think it may use JTidy underneath as well.