File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Creating a Unique ID from a String. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Creating a Unique ID from a String." Watch "Creating a Unique ID from a String." New topic
Author

Creating a Unique ID from a String.

Bharadwaj Adepu
Ranch Hand

Joined: Dec 30, 2007
Posts: 99
In Java, how can a create a Unique ID from a particular string?
I have gone through the UUID class but could not find anything, also googled this but to find nothing.
Can any one please Help me on this?


SCJP 1.5
Sagar Rohankar
Ranch Hand

Joined: Feb 19, 2008
Posts: 2902
    
    1

Try permutation and combination of a string element appended with some Randomized generated string/number.


[LEARNING bLOG] | [Freelance Web Designer] | [and "Rohan" is part of my surname]
Bharadwaj Adepu
Ranch Hand

Joined: Dec 30, 2007
Posts: 99
Try permutation and combination of a string element appended with some Randomized generated string/number.


I think this doesn't work, because, the next time when i want to generate the Unique ID for the same string then this randomized generation doesn't work as the same Unique id doesn't get generated.

Here one more thing is the length of the generated Unique ID must be less than the original String.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Sounds like you want a guaranteed hash code for a string, that's guaranteed to be "shorter" than the string. Not sure how possible that will be--what is the reason for this requirement?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38818
    
  23
Too difficult a question for us beginners. Moving.
Fred Muhlenberg
Ranch Hand

Joined: Jan 08, 2008
Posts: 39
In Java 1.5, static UUID UUID.fromString(String name);

As mentioned previously, what about the hashcode converted into a String?

Integer.toString( myString.hashcode() );
Bharadwaj Adepu
Ranch Hand

Joined: Dec 30, 2007
Posts: 99
In Java 1.5, static UUID UUID.fromString(String name);

Here fromString creates a UUID for a valid UUID string and not any string.
W. Joe Smith
Ranch Hand

Joined: Feb 10, 2009
Posts: 710
The String class may not have a hashCode() method, but Object does, and String is a subclass of Object. The same goes for Integer class and toString().

D'oh, previous post was edited. I'm still leaving my post up though.


SCJA
When I die, I want people to look at me and say "Yeah, he might have been crazy, but that was one zarkin frood that knew where his towel was."
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Bharadwaj Adepu wrote:Here one more thing is the length of the generated Unique ID must be less than the original String.

So how should this work for the empty string? (There are no strings with length less than zero.)

And how should it work for strings of one character? (All strings of length zero are the same, so you can't have a unique representation.)
Fred Muhlenberg
Ranch Hand

Joined: Jan 08, 2008
Posts: 39

Integer class doesn't have toString(java.lang.String) Method and String class doesn't have any hashcode() method.


From java.lang.Integer: public static String toString(int i)

A simple method name typo. java.lang.String: public int hashCode()

But this doesn't meet your later criteria of unique ID length less then String length. It would seem that you need to write your own mapping function.
Bharadwaj Adepu
Ranch Hand

Joined: Dec 30, 2007
Posts: 99
So how should this work for the empty string? (There are no strings with length less than zero.)


Here in my usage/application, i'll not create the Unique id for empty strings.
Here actually what i want to do is, insert the generated unique id in the database, so if length of my string is more than 4000 and if generated unique string will have the length same as the original string(larger than 4000) then i cant insert that.
Bharadwaj Adepu
Ranch Hand

Joined: Dec 30, 2007
Posts: 99
@Fred
Integer.toString( myString.hashCode() );


I think this serves my purpose, i have tested this and the hashCode generated is an number of around 10 to 12 digits, some times negative.

Thanks a LOT

But am not sure if this method creates a same hashCode for same string, i have tested this but not sure...
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Bharadwaj Adepu wrote:But am not sure if this method creates a same hashCode for same string, i have tested this but not sure...

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#hashCode()

AFAIK it's not guaranteed to be unique.
Mark Vedder
Ranch Hand

Joined: Dec 17, 2003
Posts: 624

Bharadwaj Adepu wrote:
But am not sure if this method creates a same hashCode for same string, i have tested this but not sure...


If you read the contract of the hashCode method, it states:
http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() wrote: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.


This means that it will not necessarily produce the same hash code for the same object between runs of your JVM. Also, you stated you need a Unique ID for each String. Again, reading the hashCode contract, we learn:

http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() wrote: It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

So there is no guarantee that you will get a unique ID for two different Strings. Although the odds are low, there is a possibility you might get the same ID for two different Strings. The documentation for the String's overriding of the hashCode method does not give any indication that these warnings do not apply to its implementation. (I have not studied the String class's implementation to know either way.) Common sense would indicate that given that you are dealing with Strings of 4000 characters (8000 bytes) and bigger that you can not uniquely represent all possible permutations in only 4 bytes (the size of an int).

It sounds to me what you want is a compression method/process and not necessarily a "unique ID". You want to be able to store Strings over 4000 characters in less space, and always want the same result for the same string. Do you need the ability to reverse for this "ID" to the original String? Or just get the same ID so you can look up related data in the database? I would suggest you look into compression methods and APIs.
Mark Vedder
Ranch Hand

Joined: Dec 17, 2003
Posts: 624

p.s. Of course another option might be to change your database field's data type to a CLOB rather than a VARCHAR(4000). It would probably be much easier and more reliable in the long run.
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1280

Hi Bharadwaj,

I think what you need is a hash function. In general any kind of hash is a surjective projection (hope this is the correct mathematical name for it in English) which in your case means you're trying to map a given string consisting of a (large) number of characters to some kind of value which consists of a much smaller number of characters or bytes. This simply can't be done without collisions! Maybe you can reduce the size of the original string to some degree without loss with a compression function. But I guess it won't be possible to compress 4000 characters to 4 bytes without some information loss :-)

So obviously the only way to go is to use a hash function and even if it theoretically may sound like a problem to have collisions it's often not a problem in reality. In fact there may be collisions which will map different strings to the same hash code. The only question is how likely it is for a collision to happen! A good cryptographic hash function should really make it very unlikely to get collisions.

Of course you have to decide what "good enough" means regarding your requirements!

Marco


Sagar Rohankar
Ranch Hand

Joined: Feb 19, 2008
Posts: 2902
    
    1

Bharadwaj Adepu wrote:
Try permutation and combination of a string element appended with some Randomized generated string/number.

I think this doesn't work, because, the next time when i want to generate the Unique ID for the same string then this randomized generation doesn't work as the same Unique id doesn't get generated.

of course, it will work if you append/prepend/insert the random generated integer. And for the same string, I don't think the random generated number is same.
Bharadwaj Adepu wrote:
Here one more thing is the length of the generated Unique ID must be less than the original String.

With permutations you can restrict the length of resulted string.
 
 
subject: Creating a Unique ID from a String.