I have a question about parsing html pages with Java. I want to use HtmlUnit to parse tables from a webpage that is a response to a POST request. But, I want to make that post request using HttpClient, because making a post request using HtmlUnit is a real pain.
Is it possible to somehow convert the HttpEntity that comes back from the HttpClient post request into an HtmlPage, maybe by going through an input stream?
Or is this question not even make sense?
I want to make a post request like this:
Then do something like this (I know this doesn't work, but is there some way to convert the HttpEntity into an HtmlPage?)
I need to automate a bunch of post requests, but I don't see an easy way to do that using the available HtmlUnit tools.
Is this possible, or do I just need to manually parse the html that comes back in the HttpEntity?
What kind of access do you find easier to do with HttpClient than with HttpUnit? I think it would be even less than 6 lines of code with HttpUnit.
Joined: Sep 29, 2010
Well, maybe I just don't know how to use HtmlUnit. Is there a simple way to make a post request? From the examples I found online, it seems like I have to get the first page, then look through the html to find the name of the form I want to submit, as well as the names of the fields within that form, then set the fields to the values I want and then submit it back to the website. Something like this:
That seems much more awkward than making a post request with HttpClient. That and the fact that trying to run the above code gives me a NoClassDefFound error, pointing at the WebClient, even after checking to make sure I have all the required dependencies...
Joined: Sep 21, 2011
Not quite sure I understand (if you use HttpClint, you also need to know the name of the form, and fill in the form parameters), but you could look into WebClient.getWebConnection() - that will generally return a HttpWebConnection, which is the glue between HttpClient and HtmlUnit. You may have to subclass HttpWebConnection in order to get at the HttpClient object, though.
Joined: Sep 29, 2010
Thanks for your help. I wound up just digging in and parsing the html response by hand. It was a good learning experience.