• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java Job Interview Question

 
Glen Iris
Ranch Hand
Posts: 172
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Folks,

I am stuck on the following requirement in my technical assignment.

I have to write a program that downloads images from a webpage. The images should only be downloaded if they are new or modified images compared to the ones already downloaded.

The new part is fine, I just wont download the image if I already have a matching name on disk. But how can I tell if the image bits are actually different to an image i already have on disk with the same name without downloading it?

I am really struggling with this so any help would be greatly appreciated.

Thanks.
 
fred rosenberger
lowercase baba
Bartender
Posts: 12084
29
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A matching name on your disc isn't enough. You'd probably be better off looking at the date/time stamp on the file. You should check for the last modified time, not the last accessed time.

If the file on your local drive has a date/time of 3/9/2012 at 6:00 a.m., and the file (with the exact same name) has a modified time of 3/9/2012 at 7:14 a.m., then you know the remote file has been modified since you pulled it, and you need to pull it again.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13055
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First do a HTTP HEAD request.

IF the server is written correctly you should get back response headers that include Last-Modified date and Content-Length but no response body so it is very fast.

See this wikipedia article or search for "http head request" for examples.

Here is a convenient list of response headers.

Bill
 
Winston Gutkowski
Bartender
Pie
Posts: 10087
55
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Glen Iris wrote:The new part is fine, I just wont download the image if I already have a matching name on disk. But how can I tell if the image bits are actually different to an image i already have on disk with the same name without downloading it?

I am really struggling with this so any help would be greatly appreciated.

The others are probably far more experienced than me as far as web pages specifically are concerned, but there are two basic methods of checking whether two files are the same (or almost certainly the same).
1. Check the file sizes.
2. Perform a checksum on them (only if sizes are the same).
The second takes a lot longer, but produces an integer (32 or 64 bit) result, which is all you need to download.
You may also be able to use Modification date/times, but these can be unreliable, since a copied file may have different mod date to its source.

HIH

Winston
 
Matthew Brown
Bartender
Posts: 4565
8
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While the checksum approach is most reliable, it does require the file to be downloaded, which breaks the given condition.

Another variant on William's approach is an If-Modified-Since header in a GET request (which should return a 304 response if nothing has changed).
 
Winston Gutkowski
Bartender
Pie
Posts: 10087
55
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Matthew Brown wrote:While the checksum approach is most reliable, it does require the file to be downloaded, which breaks the given condition.

No it doesn't (or at least, not necessarily). In fact, it's one of the reasons checksums were created. What it does need is the ability to do the same type of check at both ends which, with things like MD5, isn't usually too big an issue.

To me, the only question might be whether this sort of thing is built into "web" logic (eg, this HTTP HEAD request) or not. As you can tell, it's all a bit of a black art to me.

Winston
 
Matthew Brown
Bartender
Posts: 4565
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Matthew Brown wrote:While the checksum approach is most reliable, it does require the file to be downloaded, which breaks the given condition.

No it doesn't (or at least, not necessarily). In fact, it's one of the reasons checksums were created. What it does need is the ability to do the same type of check at both ends which, with things like MD5, isn't usually too big an issue.

True - my first interpretation of the question was that we're downloading from a site that we don't have any control over. But that's not explicit. And if we do, yes, I'd use a completely different approach based on checksums.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic