aspose file tools*
The moose likes Java in General and the fly likes Java Job Interview Question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java Job Interview Question" Watch "Java Job Interview Question" New topic
Author

Java Job Interview Question

Glen Iris
Ranch Hand

Joined: Jul 13, 2011
Posts: 164

Hi Folks,

I am stuck on the following requirement in my technical assignment.

I have to write a program that downloads images from a webpage. The images should only be downloaded if they are new or modified images compared to the ones already downloaded.

The new part is fine, I just wont download the image if I already have a matching name on disk. But how can I tell if the image bits are actually different to an image i already have on disk with the same name without downloading it?

I am really struggling with this so any help would be greatly appreciated.

Thanks.


OCPJP 6, OCMJD
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11406
    
  16

A matching name on your disc isn't enough. You'd probably be better off looking at the date/time stamp on the file. You should check for the last modified time, not the last accessed time.

If the file on your local drive has a date/time of 3/9/2012 at 6:00 a.m., and the file (with the exact same name) has a modified time of 3/9/2012 at 7:14 a.m., then you know the remote file has been modified since you pulled it, and you need to pull it again.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12803
    
    5
First do a HTTP HEAD request.

IF the server is written correctly you should get back response headers that include Last-Modified date and Content-Length but no response body so it is very fast.

See this wikipedia article or search for "http head request" for examples.

Here is a convenient list of response headers.

Bill
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Glen Iris wrote:The new part is fine, I just wont download the image if I already have a matching name on disk. But how can I tell if the image bits are actually different to an image i already have on disk with the same name without downloading it?

I am really struggling with this so any help would be greatly appreciated.

The others are probably far more experienced than me as far as web pages specifically are concerned, but there are two basic methods of checking whether two files are the same (or almost certainly the same).
1. Check the file sizes.
2. Perform a checksum on them (only if sizes are the same).
The second takes a lot longer, but produces an integer (32 or 64 bit) result, which is all you need to download.
You may also be able to use Modification date/times, but these can be unreliable, since a copied file may have different mod date to its source.

HIH

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4421
    
    8

While the checksum approach is most reliable, it does require the file to be downloaded, which breaks the given condition.

Another variant on William's approach is an If-Modified-Since header in a GET request (which should return a 304 response if nothing has changed).
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Matthew Brown wrote:While the checksum approach is most reliable, it does require the file to be downloaded, which breaks the given condition.

No it doesn't (or at least, not necessarily). In fact, it's one of the reasons checksums were created. What it does need is the ability to do the same type of check at both ends which, with things like MD5, isn't usually too big an issue.

To me, the only question might be whether this sort of thing is built into "web" logic (eg, this HTTP HEAD request) or not. As you can tell, it's all a bit of a black art to me.

Winston
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4421
    
    8

Winston Gutkowski wrote:
Matthew Brown wrote:While the checksum approach is most reliable, it does require the file to be downloaded, which breaks the given condition.

No it doesn't (or at least, not necessarily). In fact, it's one of the reasons checksums were created. What it does need is the ability to do the same type of check at both ends which, with things like MD5, isn't usually too big an issue.

True - my first interpretation of the question was that we're downloading from a site that we don't have any control over. But that's not explicit. And if we do, yes, I'd use a completely different approach based on checksums.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Java Job Interview Question