I have built a scraper, and I now want it to gather information that is only visible to logged in users.
I have a user name and password, full access to the information while logged in, but I'm having trouble logging in via my application.
So far I have the following code:
I got this from a tutorial about logging into websites with Java, and replaced the URL and login info with my own.
When I run this code, all I get is an outprint of the front page - it doesn't log in.
Could anyone tell me what is wrong here?
PS: I have also tried apache commons and HTMLunit, without getting even a single tutorial to work properly.
Hm, too bad about your unsuccessful trials with HtmlUnit, because that is what I would have advocated. In my experience, that's hands down the easiest approach to programmatic web access in Java. But given that you did not get URL/URLConnection to work either, maybe you want to give HtmlUnit another shot? If you post your code, I can take a look at it.
I just tried that code. Here's what I get, after replacing the URL, username and pw:
I have no idea of whether I am supposed to replace anything in this line: " new AuthScope("localhost", 443)"
I also have no idea what the output here means. I see that it didn't return any error, but nothing tells me it managed to really log in either.
You wrote your initial code as if the website used basic HTTP authentication. It's pretty uncommon for websites to use that form of authentication these days -- are you sure that was the right thing to do?
Paul Clapham wrote:You wrote your initial code as if the website used basic HTTP authentication. It's pretty uncommon for websites to use that form of authentication these days -- are you sure that was the right thing to do?
Nope, I have no knowledge of different types of HTTP authentification :p
I've seen various code that is supposed to do the same thing, without telling me there are any crucial differences between them.
It wouldn't surprise me if there's some kind of vital bit of knowledge that I lack in order to do these things.
Kari Nordmann wrote:It wouldn't surprise me if there's some kind of vital bit of knowledge that I lack in order to do these things.
Well, yeah. There are several ways to authenticate yourself to a web site. Some of those ways are managed by the web server, some of them are managed by the application. Typically these days it's the application which manages the authentication, and if that's the case you just have to mimic the requests your browser sends to the application and handle the responses in the same way. HtmlUnit is a good way to do that. However you might want to spend a while reviewing the possibilities before you make another guess at how it actually works.