• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Creating a Unique ID from a String.

 
Bharadwaj Adepu
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In Java, how can a create a Unique ID from a particular string?
I have gone through the UUID class but could not find anything, also googled this but to find nothing.
Can any one please Help me on this?
 
Sagar Rohankar
Ranch Hand
Posts: 2907
1
Java Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try permutation and combination of a string element appended with some Randomized generated string/number.
 
Bharadwaj Adepu
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try permutation and combination of a string element appended with some Randomized generated string/number.


I think this doesn't work, because, the next time when i want to generate the Unique ID for the same string then this randomized generation doesn't work as the same Unique id doesn't get generated.

Here one more thing is the length of the generated Unique ID must be less than the original String.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds like you want a guaranteed hash code for a string, that's guaranteed to be "shorter" than the string. Not sure how possible that will be--what is the reason for this requirement?
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Too difficult a question for us beginners. Moving.
 
Fred Muhlenberg
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In Java 1.5, static UUID UUID.fromString(String name);

As mentioned previously, what about the hashcode converted into a String?

Integer.toString( myString.hashcode() );
 
Bharadwaj Adepu
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In Java 1.5, static UUID UUID.fromString(String name);

Here fromString creates a UUID for a valid UUID string and not any string.
 
W. Joe Smith
Ranch Hand
Posts: 710
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The String class may not have a hashCode() method, but Object does, and String is a subclass of Object. The same goes for Integer class and toString().

D'oh, previous post was edited. I'm still leaving my post up though.
 
Paul Clapham
Sheriff
Pie
Posts: 20971
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bharadwaj Adepu wrote:Here one more thing is the length of the generated Unique ID must be less than the original String.

So how should this work for the empty string? (There are no strings with length less than zero.)

And how should it work for strings of one character? (All strings of length zero are the same, so you can't have a unique representation.)
 
Fred Muhlenberg
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Integer class doesn't have toString(java.lang.String) Method and String class doesn't have any hashcode() method.


From java.lang.Integer: public static String toString(int i)

A simple method name typo. java.lang.String: public int hashCode()

But this doesn't meet your later criteria of unique ID length less then String length. It would seem that you need to write your own mapping function.
 
Bharadwaj Adepu
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So how should this work for the empty string? (There are no strings with length less than zero.)


Here in my usage/application, i'll not create the Unique id for empty strings.
Here actually what i want to do is, insert the generated unique id in the database, so if length of my string is more than 4000 and if generated unique string will have the length same as the original string(larger than 4000) then i cant insert that.
 
Bharadwaj Adepu
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@Fred
Integer.toString( myString.hashCode() );


I think this serves my purpose, i have tested this and the hashCode generated is an number of around 10 to 12 digits, some times negative.

Thanks a LOT

But am not sure if this method creates a same hashCode for same string, i have tested this but not sure...
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bharadwaj Adepu wrote:But am not sure if this method creates a same hashCode for same string, i have tested this but not sure...

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#hashCode()

AFAIK it's not guaranteed to be unique.
 
Mark Vedder
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bharadwaj Adepu wrote:
But am not sure if this method creates a same hashCode for same string, i have tested this but not sure...


If you read the contract of the hashCode method, it states:
http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() wrote: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.


This means that it will not necessarily produce the same hash code for the same object between runs of your JVM. Also, you stated you need a Unique ID for each String. Again, reading the hashCode contract, we learn:

http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode() wrote: It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

So there is no guarantee that you will get a unique ID for two different Strings. Although the odds are low, there is a possibility you might get the same ID for two different Strings. The documentation for the String's overriding of the hashCode method does not give any indication that these warnings do not apply to its implementation. (I have not studied the String class's implementation to know either way.) Common sense would indicate that given that you are dealing with Strings of 4000 characters (8000 bytes) and bigger that you can not uniquely represent all possible permutations in only 4 bytes (the size of an int).

It sounds to me what you want is a compression method/process and not necessarily a "unique ID". You want to be able to store Strings over 4000 characters in less space, and always want the same result for the same string. Do you need the ability to reverse for this "ID" to the original String? Or just get the same ID so you can look up related data in the database? I would suggest you look into compression methods and APIs.
 
Mark Vedder
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
p.s. Of course another option might be to change your database field's data type to a CLOB rather than a VARCHAR(4000). It would probably be much easier and more reliable in the long run.
 
Marco Ehrentreich
best scout
Bartender
Posts: 1294
IntelliJ IDE Java Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Bharadwaj,

I think what you need is a hash function. In general any kind of hash is a surjective projection (hope this is the correct mathematical name for it in English) which in your case means you're trying to map a given string consisting of a (large) number of characters to some kind of value which consists of a much smaller number of characters or bytes. This simply can't be done without collisions! Maybe you can reduce the size of the original string to some degree without loss with a compression function. But I guess it won't be possible to compress 4000 characters to 4 bytes without some information loss :-)

So obviously the only way to go is to use a hash function and even if it theoretically may sound like a problem to have collisions it's often not a problem in reality. In fact there may be collisions which will map different strings to the same hash code. The only question is how likely it is for a collision to happen! A good cryptographic hash function should really make it very unlikely to get collisions.

Of course you have to decide what "good enough" means regarding your requirements!

Marco


 
Sagar Rohankar
Ranch Hand
Posts: 2907
1
Java Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bharadwaj Adepu wrote:
Try permutation and combination of a string element appended with some Randomized generated string/number.

I think this doesn't work, because, the next time when i want to generate the Unique ID for the same string then this randomized generation doesn't work as the same Unique id doesn't get generated.

of course, it will work if you append/prepend/insert the random generated integer. And for the same string, I don't think the random generated number is same.
Bharadwaj Adepu wrote:
Here one more thing is the length of the generated Unique ID must be less than the original String.

With permutations you can restrict the length of resulted string.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic