Some issues/hints...
* Network code can be vulnerable to
thread timing issues which can make your tests fail intermittently every five or fifty or five hundred times.
* If you don't control the target website, there tends to be a bit of trial and error in finding the correct url and html formats... tests are a good way of being systematic about this trial and error... if a page breaks your bot in trials, you add it to your tests and fix your bot until it passes the tests.
* There's a question of how heavy do you have to make the test framework... if scale wasn't an issue, we might test our cnn.com headline scraping bots by simulating the whole cnn.com website in our test harness.
* Your test code might be more robust and manageable if you test the following areas of functionality separately (assuming that your bot has a structure along these lines...)
- mapping of information request to url
- the http client itself (mapping of url to html page)
- mapping of html page to list of headlines or whatever (html parsing)