aspose file tools*
The moose likes Java in General and the fly likes How to Create a Unique 16 Character String Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to Create a Unique 16 Character String" Watch "How to Create a Unique 16 Character String" New topic
Author

How to Create a Unique 16 Character String

Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Hi All,

I'd like to write some code to generate a unique 16 character sting (not UUID). The string must be very hard for someone to guess...

Any suggestions?

Thanks
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13884
    
  10

You could pick 16 random characters and put them in a string. But it would be hard to make sure that the string is unique. What exactly do you mean by "unique" - must it be globally unique (like an UUID), which means that there is an astronomically small chance that the same string will ever be generated twice, or should it be different from all strings in some dataset that you have (in a database, for example)?


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Thanks for replying.

It must be Globally Unique within the organisation like a UUID but only 16 characters long.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Matt McDonald wrote:Thanks for replying.

It must be Globally Unique within the organisation like a UUID but only 16 characters long.


If it must be unique (that is, even the astronomically small chance of duplicates is unacceptable), and it can't be guessable, then your only real options are:

1) Randomly generate 16 characters, and then compare against a list of already generated IDs, and if it's a duplicate, try again.

2) Use a decent pseudorandom algorithm, like what java.util.Random uses to pick the next in the sequence based on the most recently generated, and seed it with something random. In this case, if somebody knows the algorithm and knows what's been generated so far, he can predict what comes next. I don't really consider this a viable alternative, but I figured I'd throw it out there for grins.

But I have to ask:

1) Does it really have to be unique? Do you understand what it means for there to be a 1 in 2^128 chance of collision between any two, and how many you can produce before you have even a 1 in a million chance?

2) Why can't it be UUID?
Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Thanks, you've given me something to think about, I didn't think about checking whether the ID had been created previously. The process of issuing an ID needs to be very quick so checking the existence of a previous id may have performance issues for me.

In answer to your questions:

1) It nees to be unique because it will represent a paricular event in a workflow system. We can't have two events with the same ID.
2) I've been told to implement 16 characters, that decision is out of my hands.

Thanks
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Matt McDonald wrote:Thanks, you've given me something to think about, I didn't think about checking whether the ID had been created previously. The process of issuing an ID needs to be very quick so checking the existence of a previous id may have performance issues for me.


If you store previously issued IDs in a HashSet, it will be very quick. Even if you have to go to a DB every time, I can't imagine that would be a problem. You're not generating thousands of these things per second, sustained, are you?


1) It nees to be unique because it will represent a paricular event in a workflow system. We can't have two events with the same ID.


Using UUID, the odds of a collision are ridiculously small. I forget the exact values, but it's something like generating a thousand a second for a thousand years gives less then a 1 in a million chance of collision. See http://en.wikipedia.org/wiki/Birthday_problem for details.

Basically, the odds of a collision are much smaller than the odds of a serious failure due to a bug in your code or a hard disk crash or some other mundane catastrophe.

2) I've been told to implement 16 characters, that decision is out of my hands.


Which characters are you allowed to use? That is, how many bits of entropy will you have?




Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Any and all characters can be used :-)
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Matt McDonald wrote:Any and all characters can be used :-)


So, the entire Unicode set then. Trimming down to the older, smaller unicode, we have 16 characters at 16 bits per character--256 bits. UUID has 32 characters at 4 bits/char, so 128 bits. Therefore, you can easily encode a UUID into your character set.
Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Ok thanks, I'll look into it.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

To be clear: You're saying this would be a valid ID:
語% औئ¾Ю⟰◪∴ᦌᛯᏌᄫആʘ
Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Ok, alpha/numeric characters only....
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Matt McDonald wrote:Ok, alpha/numeric characters only....

Okay, so... A-Z a-z 0-9, yes?

I'm trying to get precise requirements from you, since one can't really choose an approach without understanding the requirements.

If this is your requirement, then that's 62^16, which we can round up to 64^16, which is (2^6)^16, which is 2^96, so an upper bound of 96 bits, which means we cannot fit a UUID (128 bits) without loss. So, barring further constraints, I'd just go with generating 16 random characters. It's probably no cryptographically secure, but at this point I'll assume it's good enough for your needs. Only you can answer that for sure though.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36599
    
  16
2 ^ 128 = 340282366920938463463374607431768211456 which has 39 digits
62 ^ 16 = 47672401706823533450263330816 which has 29 digits
256 ^ 16 is the same as 2 ^ 128.
So you would have to look for 256 different characters in your 16-character String. Of course you can use accented letters (as in French, Spanish etc), Greek and Russian letters if you wish. Beware: things like the Greek capital Α (α) and Russian capital Ah look exactly like A.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Campbell Ritchie wrote:2 ^ 128 = 340282366920938463463374607431768211456 which has 39 digits
62 ^ 16 = 47672401706823533450263330816 which has 29 digits
256 ^ 16 is the same as 2 ^ 128.
So you would have to look for 256 different characters in your 16-character String.


@Matt: To be clear, this is what you would have to do if you wanted your 16-char String to have as much entropy as a 128-bit UUID. If your requirement is simply 16 alphanumerics with whatever entropy that gives you, then these numbers are only of academic interest.

@Campbell: I don't think that's actually his requirement. He started off asking for 16 random characters, and said he couldn't use UUID. I'm the one that delved further down the UUID/128-bit path.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36599
    
  16
In which case you have already given the correct answer, yesterday, ie create 16 chars and test your String against a set.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7081
    
  16

Jeff Verdegan wrote:@Matt: To be clear, this is what you would have to do if you wanted your 16-char String to have as much entropy as a 128-bit UUID.

Actually, a UUID is directly convertible to an array of 16 bytes, which could then be used to construct a String.
Providing Matt takes some care to ensure that a consistent character set (UTF-16?) is used to encode, it should be reasonably easy to make sure that (a) the result is exactly 16 characters long, and (b) it can be decoded precisely if required.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Winston Gutkowski wrote:
Jeff Verdegan wrote:@Matt: To be clear, this is what you would have to do if you wanted your 16-char String to have as much entropy as a 128-bit UUID.

Actually, a UUID is directly convertible to an array of 16 bytes, which could then be used to construct a String.


Only if he has 256 different characters he can use. He stated he's restricted to alphanumerics though, so presumably [A-Za-z0-9], which is somewhat less than 256.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Campbell Ritchie wrote:In which case you have already given the correct answer, yesterday, ie create 16 chars and test your String against a set.


Actually, Jesper suggested it in the first reply. I merely reiterated it in the event that UUID was well and truly out.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7081
    
  16

Jeff Verdegan wrote:Only if he has 256 different characters he can use. He stated he's restricted to alphanumerics though, so presumably [A-Za-z0-9], which is somewhat less than 256.

Ah. Missed that. Must learn to read.

If you don't want to write a lot of code, it seems to me that two lots of (pseudocode)
Long.toString(36^7 + (Random.nextLong() % ((36^8)-(36^7))), 36);
would do the trick, unless he must use both upper and lower case.

Winston
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36599
    
  16
Jeff Verdegan wrote: . . . . Actually, Jesper suggested it in the first reply. . . .
Apologies to both of you; like WG I ought to learn to read!
Matt McDonald
Greenhorn

Joined: Apr 15, 2010
Posts: 8
Thanks everyone for your comments, you've given me lots of ideas to investigate, better go and build something now....
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to Create a Unique 16 Character String
 
Similar Threads
something weird
why the length of array is same?
generate unique fixed length code from a string
Unicode
What is a Unicode code unit and a Unicode code point?