wood burning stoves 2.0*
The moose likes Java in General and the fly likes Advice on a UID Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Advice on a UID" Watch "Advice on a UID" New topic
Author

Advice on a UID

Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
I have a site with a cluster of web servers. Previous developers have these web servers generating a UID that is used for the primary keys of nearly every table in the database. The code for it is atrocious, collisions are common and will only get worse as time goes on. I need to replace it.

Here's the catch: I only have sixteen bytes to work with, encoded as ISO-8859-1. Note that this catch kills UUID or even java.rmi.server.UID dead in it's tracks. There's no space for it. Changing the size is not an option.

Here's the good news: it doesn't need to be a UUID or global at all, it doesn't need to be particularly efficient, it can be presumed to be called at most a few hundred times a second per server, all servers have, and always will have, an IP that is unique amongst those servers, and finally the lifetime is short (less than a decade), server time will never move backwards.

Here's what I came up with at first blush, keeping the above in mind.


Any holes (you see in the example) or suggestions are welcomed.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

How could there be a lot of collisions with 16 bytes of 0-255? That's a really big quantity of possibilities, like 3x10E38 or something ridiculous like that. And how does ISO-8859-1 preclude the built-in UUID?
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
David Newton wrote:How could there be a lot of collisions with 16 bytes of 0-255? That's a really big quantity of possibilities, like 3x10E38 or something ridiculous like that.


You're assuming it's random. It's not even close, the code is really bad.

David Newton wrote:And how does ISO-8859-1 preclude the built-in UUID?


It was the size not the encoding I was referring to. Looks like I misread the UUID documentation. I can use UUID but only if I encode the raw bytes myself.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

ISO-8859-1 is 0-255; unless you meant something different by encoding.
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
David Newton wrote:ISO-8859-1 is 0-255; unless you meant something different by encoding.


I only have sixteen bytes to work with


I misread the UUID docs as being 128 bytes.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Oh, I see. What does that have to do with encoding the raw bytes?
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
David Newton wrote:Oh, I see. What does that have to do with encoding the raw bytes?


It's char(16).

UUID.toString() is hexadecimal. 32 characters not including the separators.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Those *are* the encoded raw bytes. Why can't you just write the UUID, which is 0-255 (which is what ISO-8859-1 defines)?
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
David Newton wrote:Those *are* the encoded raw bytes. Why can't you just write the UUID, which is 0-255 (which is what ISO-8859-1 defines)?


What happens when you write "c46f6ae1-7b2e-4baa-8df8-05fbdd91c075" into a 16 character field? You can't. Every single one of those character is going to get encoded into ISO-8859-1, one byte each, it's more than double the length of the field.

Furthermore, what happens when I grab the raw 16 bytes and encode each byte into ISO-8859-1 as 1 character per byte, as opposed to hexadecimal which requires two characters per byte, and the byte happens to be 7F? I'm betting the DB is going to puke at me and even if it doesn't I'm not sure what will happen when it gets to some place like SF and a salesperson is trying to copy & paste an ID with a control character in it.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

You said the field was encoded in 8859-1. The bytes are all valid 8859-1 characters. *You* were the one that was talking about encoding the raw bytes and using toString, not me. You never said anything about needing a printable representation--not sure how I would be expected to guess that. In any case, good luck.
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
You said the field was encoded in 8859-1


Yes. The field is a char(16) and the encoding is 8859-1.

The bytes are all valid 8859-1 characters.


8859-1 doesn't cover all 256 possible values does it? Aren't quite a few ranges "unused"?

*You* were the one that was talking about encoding the raw bytes and using toString, not me.


Yes, I'm thinking on paper. toString() is unsuitable because the length is too long because it's hex. I don't see anything in the UUID class that will give me a String or otherwise encode the bytes into something 16 characters in length. Hence I would have to do it myself. I don't see how that's possible with 8859-1 because of the above concerns.

You never said anything about needing a printable representation--not sure how I would be expected to guess that.


It didn't occur to me until I was typing that. So realistically, there's no way to get a UUID into a 16 character string or am I missing something?

In any case, good luck.


Thanks.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

http://en.wikipedia.org/wiki/ISO/IEC_8859-1#ISO-8859-1

It doesn't matter if a char is unused anyway--your concern appears to be printability, which is a different issue than encoding.
Ken Blair
Ranch Hand

Joined: Jul 15, 2003
Posts: 1078
My mistake.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Advice on a UID
 
Similar Threads
Google Guice and factories question
Problem injecting Sessionscoped bean in Managed bean
jsf navigation and database connection
PC Connected to Internet and Internet Connectivity?
JTextField/Reading Content