This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Reading the content of a webpage is simple enough with class java.net.URL:
After you've done that, you'll have to find the stuff you want to find in the HTML page. You could do it the simple way, with String.indexOf() for example, but maybe that won't be flexible enough.
You could try regular expressions, or you could use a HTML parser to walk through the structure of the HTML and find the text you're looking for. Something like http://htmlparser.sourceforge.net/ might be useful for that.