This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes I need a Hash function? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "I need a Hash function?" Watch "I need a Hash function?" New topic
Author

I need a Hash function?

Monu Tripathi
Rancher

Joined: Oct 12, 2008
Posts: 1369
    
    1

I have a class which downloads a resource(image file etc.) in background and writes it onto a persistent storage. While saving the file, I am using the last portion of the URL as the file name. For e.g. if http://some.host.name/some/path/Resource.png is the URL the name of the file would be "Resource.png".

The problem is there could be more than one URL which has "Resource.png" as the last component which causes overwrites and therefore loss of data. I need some way to generate a unique ID for a resource referenced by a URL.
[Note: the only information passed to this module about the resource is the URL]

Is there a Hash function for such cases that I can use?

Thanks.


[List of FAQs] | [Android FAQ] | [Samuh Varta]
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

You could make a synchronized method which would return the file name, with a timestamp appended to it.


[My Blog]
All roads lead to JavaRanch
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14074
    
  16

Note that hash functions do not generate unique IDs - so you are not looking for a hash function.

One thing you could do is store the resources in a directory structure. For example, if the URL is http://some.host.name/some/path/Resource.png, then create a directory "some.host.name", containing a directory "some", containing a directory "path", in which you store Resource.png.

Or, if for some reason you can't do that, replace characters in the URL until you get a valid filename, for example some_host_name_some_path_Resource.png (although in principle you could still get name clashes if you do that).

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Monu Tripathi
Rancher

Joined: Oct 12, 2008
Posts: 1369
    
    1

Christophe Verré wrote:You could make a synchronized method which would return the file name, with a timestamp appended to it.

Thanks for your answer.

I'd given a thought to using timestamps; the problem is I have to retrieve files back again, given a URL. Generating the same timestamp will become a problem. I think I failed to mention retrieval in my original post. I apologize.
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

Then Jesper's idea seems more appropriate.
Monu Tripathi
Rancher

Joined: Oct 12, 2008
Posts: 1369
    
    1

Jesper Young wrote:Note that hash functions do not generate unique IDs - so you are not looking for a hash function.

I knew this would come up and therefore I put a question mark in my thread topic. Anyways, I want unique Hash values for my problem.

Jesper Young wrote:
One thing you could do is store the resources in a directory structure. For example, if the URL is http://some.host.name/some/path/Resource.png, then create a directory "some.host.name", containing a directory "some", containing a directory "path", in which you store Resource.png.

This occurred to me but I ruled it out for creating the directory structure would take some time since I am dealing with lots of URLs here, each of varying depths and this code will run on mobile device.

Jesper Young wrote:
Or, if for some reason you can't do that, replace characters in the URL until you get a valid filename, for example some_host_name_some_path_Resource.png (although in principle you could still get name clashes if you do that).

This option of replacing characters in the URL appeals to me. The only drawback that I can see with them is long file names. I dont see the potential of name clashes because URLs of uniques resources will have to be different(else wise they are pointing to same resource and then overwrites are not a problem).

Thanks for your reply.


Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

You could use something like a MessageDigest to turn the URL into an MD5 hash :


Then transform the byte array into hexadecimal values.

(Plenty of examples, like here)
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Or just use a key/value DB.
Monu Tripathi
Rancher

Joined: Oct 12, 2008
Posts: 1369
    
    1

Christophe Verré wrote:You could use something like a MessageDigest to turn the URL into an MD5 hash...

thanks Christopher; haven't used it ever, will surely read up....
David Newton wrote:Or just use a key/value DB.

Yes, that is an option too. Thanks!
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37897
    
  22
Too difficult a question for "beginning". Moving thread.
Jim Hoglund
Ranch Hand

Joined: Jan 09, 2008
Posts: 525
Is your problem duplicate file names? When you detect a duplicate, how about
the common approach of adding: (1), (2), (3), etc., just ahead of the file extension?
This way, you know exactly what to look for during further processing. And human
readers of the file names can easily see what is going on.

Jim ... ...


BEE MBA PMP SCJP-6
Monu Tripathi
Rancher

Joined: Oct 12, 2008
Posts: 1369
    
    1

Jim Hoglund wrote:Is your problem duplicate file names? When you detect a duplicate, how about
the common approach of adding: (1), (2), (3), etc., just ahead of the file extension?
This way, you know exactly what to look for during further processing. And human
readers of the file names can easily see what is going on.

Jim ... ...


Thanks for your suggestion Jim.

The problem is to write and retrieve a resource to and from a persistent storage given only the resource URL. As I see it, your approach will not ensure easy retrieval unless I find a way to tag URL information also with the File.
Jim Hoglund
Ranch Hand

Joined: Jan 09, 2008
Posts: 525
Montu : Did you consider David's idea. You could abstract the key
to a single integer by summing the URL characters, for example.

David Newton wrote:Or just use a key/value DB.

Jim ... :) ...
Monu Tripathi
Rancher

Joined: Oct 12, 2008
Posts: 1369
    
    1

Jim Hoglund wrote:Montu : Did you consider David's idea. You could abstract the key
to a single integer by summing the URL characters, for example....


David Newton wrote:Or just use a key/value DB.

Monu Tripathi wrote:Yes, that is an option too. Thanks!

I cant setup a database now unfortunately due to time constraints but it is a valid and useful suggestion.
I am going with Jesper's solution and in the meanwhile also reading up on MessageDigest.
Jim Hoglund
Ranch Hand

Joined: Jan 09, 2008
Posts: 525
Yes, Jesper's idea looks very doable. And the results are
visible with just a file browser. Neat and clean . . .

Jim ... ...
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: I need a Hash function?
 
Similar Threads
Forward slash query
New to Java
what does android:id="@+id/label mean?
Generics Methods