wood burning stoves 2.0*
The moose likes Sockets and Internet Protocols and the fly likes Downloading a file using URL, URLConnection classes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "Downloading a file using URL, URLConnection classes" Watch "Downloading a file using URL, URLConnection classes" New topic
Author

Downloading a file using URL, URLConnection classes

ozg yeal
Greenhorn

Joined: Dec 10, 2004
Posts: 9
Hi!

I wrote a Java code(which uses URL class) which downloads a file given its url address. The code worked fine except one circumstance. I don't remember the URL right now but Let's assume it is :
www.domainname.com/dir1/dir2/file.txt
When I entered the URL address, my java code downloaded a redirected web page. When I opened the URL address on my browser, the browser loaded the redirected page too.
I could only download the file by right-clicking the link to that file from a webpage like "www.domainname.com/dir1/dir2/index.htm" on my IE browser.

How can I download that file programmatically???

Thanks.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Look for a way to set "follow redirect" to false on URL or HttpUrlConnection or one of those classes. I don't recall if it's an explicit method or a property but it should be in the Javadoc somewhere.

If the server sends back an actual HTML page with a redirect instruction in the page you should be able to capture the first page. Actually, if that's the case you should be getting it now. If the server sends back a redirect header, then it never sends the file for you to capture, and the file you ask for might not even exist. That seems much more likely as I think about it and I wouldn't expect to find a way around it.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
ozg yeal
Greenhorn

Joined: Dec 10, 2004
Posts: 9
I tried to stop following redirects but still I download a file which is HTML code. And I'm sure the file is there because I can download it from my browser by entering a web page on the same path.
Can't I pretend what the browser is doing??
Elliotte Rusty Harold
author
Ranch Hand

Joined: Feb 25, 2004
Posts: 91
I discuss this on pp. 540-542. In brief you need to call HttpURLConnection.setFollowRedirects(false), then get a URLConnection object pointing to the URL. It should now download the original page you asked for without redirecting it. You can also configure this on a page by page basis with setInstanceFollowRedirects.


Elliotte Rusty Harold<br />Author of <a href="http://cafe.elharo.com/web/refactoring-html/" target="_blank" rel="nofollow">Refactoring HTML</a>
Ko Ko Naing
Ranch Hand

Joined: Jun 08, 2002
Posts: 3178
Originally posted by Elliotte Rusty Harold:
You can also configure this on a page by page basis with setInstanceFollowRedirects.


Mr.Elliotte,
Does the method "setInstanceFollowRedirects" work page by page basis? I didn't know that we can use that method to achieve such functionality. I have heard about the HttpURLConnection.setFollowRedirects(false) functionality...

Could you please explain a bit about the method "setInstanceFollowRedirects" here so that we can get an idea how to apply it in which scenario? Thanks...


Co-author of SCMAD Exam Guide, Author of JMADPlus
SCJP1.2, CCNA, SCWCD1.4, SCBCD1.3, SCMAD1.0, SCJA1.0, SCJP6.0
ozg yeal
Greenhorn

Joined: Dec 10, 2004
Posts: 9
Hi!

Thanks for your reply Mr. Elliotte. I called HttpURLConnection.setFollowRedirects(false) and I downloaded another html page which has a "page has moved" link to the redirected page. In another case where the url address was pointing a jpeg file JVM threw exception:
java.io.IOException: Server returned HTTP response code: 403
But I'm sure the files exist because I'm copy-pasting them from my browsers address textbox. And I was able to download or save them. If I closed the browser window and then reenter url address of the file directly I wasn't able to download them. I got redirected pages or recieved HTTP response code: 403 forbidden. The server just doesn't let me download the file directly by entering the url address. (As stated in my previous posts)I could only download the files by opening a webpage on the same path and server as of the files' and then clicking links to that files.

So setting followRedirects flag false doesn't work. I think I should do what I do with my browser. I should first open the webpage which has the links then download the file. But I don't know how to apply it to my code using java.
Is there a way??? (maybe using sockets,http headers, etc.)


Thanks to everybody.
Elliotte Rusty Harold
author
Ranch Hand

Joined: Feb 25, 2004
Posts: 91
HTTP 403 means the resource exists but the server is not letting you have it. It may be that the server is checking the referer or the user-agent header, and only sending the content if it likes what it sees. It may be blocking downloads not refered by one of its own pages, or it may be blocking the java user-Agent.
Ko Ko Naing
Ranch Hand

Joined: Jun 08, 2002
Posts: 3178
ozg yeal,
It seems like the permission needs to be granted first to get access to it... But, as Mr.Elliotte has just mentioned, the server may block certain user-agents including Java user-agent... So you might need to check with the server's allowed user-agents...
ozg yeal
Greenhorn

Joined: Dec 10, 2004
Posts: 9
It doesn't only block java user-agents. It blocks all of the direct access to the file. As I mentioned earlier If I copy-paste the url address of the file to my browser(IE) and hit enter, I get the same 403 forbidden error. I can only view or download the file by opening a specific webpage on the same server and then clicking the link there on my browser.

Now I'm trying to find out what browsers do to provide this? Maybe sending http headers, etc...
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
If you're really serious, you could download an HTTP sniffer like HTTPLook and see exactly what headers are being sent when you click the link that works. Then duplicate all those headers over a raw socket or see if you can get HttpUrlConnection to do so. I'd guess the server is checking referrer or some relatively simple header field.

Of course you're going to quite a bit of effort to subvert the intent of the site designer, which could be considered black hat hacking by some.
[ December 15, 2004: Message edited by: Stan James ]
ozg yeal
Greenhorn

Joined: Dec 10, 2004
Posts: 9
But Internet Explorer can download the file and doesn't hack. Does it?
I wonder Is this stuff really secret and isn't open to the public?
 
Don't get me started about those stupid light bulbs.
 
subject: Downloading a file using URL, URLConnection classes
 
Similar Threads
file renameTo() method
File.Separator
Development: trouble with import and classpath
I/O Question.
Difference JSP File & JSP Page