Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Advice on a UID

 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a site with a cluster of web servers. Previous developers have these web servers generating a UID that is used for the primary keys of nearly every table in the database. The code for it is atrocious, collisions are common and will only get worse as time goes on. I need to replace it.

Here's the catch: I only have sixteen bytes to work with, encoded as ISO-8859-1. Note that this catch kills UUID or even java.rmi.server.UID dead in it's tracks. There's no space for it. Changing the size is not an option.

Here's the good news: it doesn't need to be a UUID or global at all, it doesn't need to be particularly efficient, it can be presumed to be called at most a few hundred times a second per server, all servers have, and always will have, an IP that is unique amongst those servers, and finally the lifetime is short (less than a decade), server time will never move backwards.

Here's what I came up with at first blush, keeping the above in mind.


Any holes (you see in the example) or suggestions are welcomed.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How could there be a lot of collisions with 16 bytes of 0-255? That's a really big quantity of possibilities, like 3x10E38 or something ridiculous like that. And how does ISO-8859-1 preclude the built-in UUID?
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Newton wrote:How could there be a lot of collisions with 16 bytes of 0-255? That's a really big quantity of possibilities, like 3x10E38 or something ridiculous like that.


You're assuming it's random. It's not even close, the code is really bad.

David Newton wrote:And how does ISO-8859-1 preclude the built-in UUID?


It was the size not the encoding I was referring to. Looks like I misread the UUID documentation. I can use UUID but only if I encode the raw bytes myself.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ISO-8859-1 is 0-255; unless you meant something different by encoding.
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Newton wrote:ISO-8859-1 is 0-255; unless you meant something different by encoding.


I only have sixteen bytes to work with


I misread the UUID docs as being 128 bytes.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh, I see. What does that have to do with encoding the raw bytes?
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Newton wrote:Oh, I see. What does that have to do with encoding the raw bytes?


It's char(16).

UUID.toString() is hexadecimal. 32 characters not including the separators.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Those *are* the encoded raw bytes. Why can't you just write the UUID, which is 0-255 (which is what ISO-8859-1 defines)?
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Newton wrote:Those *are* the encoded raw bytes. Why can't you just write the UUID, which is 0-255 (which is what ISO-8859-1 defines)?


What happens when you write "c46f6ae1-7b2e-4baa-8df8-05fbdd91c075" into a 16 character field? You can't. Every single one of those character is going to get encoded into ISO-8859-1, one byte each, it's more than double the length of the field.

Furthermore, what happens when I grab the raw 16 bytes and encode each byte into ISO-8859-1 as 1 character per byte, as opposed to hexadecimal which requires two characters per byte, and the byte happens to be 7F? I'm betting the DB is going to puke at me and even if it doesn't I'm not sure what will happen when it gets to some place like SF and a salesperson is trying to copy & paste an ID with a control character in it.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You said the field was encoded in 8859-1. The bytes are all valid 8859-1 characters. *You* were the one that was talking about encoding the raw bytes and using toString, not me. You never said anything about needing a printable representation--not sure how I would be expected to guess that. In any case, good luck.
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You said the field was encoded in 8859-1


Yes. The field is a char(16) and the encoding is 8859-1.

The bytes are all valid 8859-1 characters.


8859-1 doesn't cover all 256 possible values does it? Aren't quite a few ranges "unused"?

*You* were the one that was talking about encoding the raw bytes and using toString, not me.


Yes, I'm thinking on paper. toString() is unsuitable because the length is too long because it's hex. I don't see anything in the UUID class that will give me a String or otherwise encode the bytes into something 16 characters in length. Hence I would have to do it myself. I don't see how that's possible with 8859-1 because of the above concerns.

You never said anything about needing a printable representation--not sure how I would be expected to guess that.


It didn't occur to me until I was typing that. So realistically, there's no way to get a UUID into a 16 character string or am I missing something?

In any case, good luck.


Thanks.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
http://en.wikipedia.org/wiki/ISO/IEC_8859-1#ISO-8859-1

It doesn't matter if a char is unused anyway--your concern appears to be printability, which is a different issue than encoding.
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My mistake.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic