-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New service reference is not unique enough #411
Comments
To be honest, I was afraid this what going to happen. Two ways of fixing this:
|
Nice pickup @mngan. If we were to go with a 8 digit hexadecimal value we would reduce our chance of collision to 1.15% across 10,000 sessions. If we were to expand this up to 12 digits we would only have a 0.00001776179% chance of a collision allowing for much more confidence in the uniqueness. |
|
I did not mean to use the values as string but rather use them as seed for a random value generator. We could then get a random integer (hopefully unique) that could be displayed as its hexadecimal representation. Not sure it's the best way (welcome to the world of pseudo random uniqueness), just a midnight thought :) |
Based on the probability table on the Wikipedia page, a 12 digit hex number (48bits) will have about a 1 in a million chance of collision at 24,000 entries, or 1 in 10,000 change at 750,000 entries. That seems pretty reasonable odds for having to re-roll the UUID in the case of a collision. In terms of the length of the string, you can reduce it by using a different encoding. Instead of hexadecimal, what is you used base 32 or base 64? That could reduce it down to around 8 or 9 digits. 32 is readable because you can use 22 upper case letters plus digits. Base64 you have to use mixed case and symbols which isn't great. The other way is to calculate a monotonically increasing UUID. Start at a (random?) point and then keep adding a random value to the last value. It is unlikely you will have enough entries to wrap a 32bit number, especially if you limit the starting value range, and you don't make your random offset too large. You may still want to have collision detection just in case you do wrap, but that seems highly unlikely. Using random numbers is easy to code, but has surprising edge cases. The monotonically increasing number you have to code yourself, but should be fairly robust if you give yourself enough space. |
I too would agree these are reasonable odds, and this is the easy fix. Base32/Base64 are a valid option the only roadblock would be finding enough unique information to encode. Local/Remote ASN + Peer IP may be enough but there is a possibility one may have duplicates (e.g. customer sessions using RFC1918). I do like the monotonically approach and this is something @gmazoyer and myself spoke about, I cannot remember the reason why we didn't agree on this now, possibly to do with handling of deleted sessions. In theory, we could just use |
Deleted objects are just deleted. So you will get holes in your data. The soft delete PR I had, would keep the deleted objects around, but there was a clean up process that could be run to flush out the deleted objects. Also, when using |
We actually already have the unique primary key stored in def save(self, *args, **kwargs):
"""
Overrides default `save()` to set the service reference if left blank.
"""
# Create ID
super().save(*args, **kwargs)
if not self.service_reference:
self.service_reference = self.generate_service_reference()
return super().save(*args, **kwargs) Reference could then just pad the ID out to 8 digits or something similar. |
I would be wary about using database ID's directly. It could be fragile. Unless you were using a table with an autoincrement explicitly to generate the service IDs. |
Given the ID is the primary key then I cannot see an instance where this wouldn't be auto incremented nor non unique, unless you can think of a specific case we may hit? |
I think mainly I am concerned about cases where someone exports the data (without ID's) and re-imports it, but gets assigned new ID's and so all the service ID's will change. You want this to be stable. |
Service Reference has been set to a free text field for the |
Environment
Steps to Reproduce
run
manage.py migrate
from mainExpected Behavior
Migration should run without errors
Observed Behavior
As you can see there are multiple collisions just within the IX peering sessions. I think the UUID should be longer or perhaps a different section of the UUID should be used? With 6 hex digits, this is only 24bits or 16M possible UUIDs which seems a little small given the number of sessions you could have. This is a class of the Birthday Problem. If I understand it correctly, with 10,000 sessions, the chance of a collision is around 94% with the number of bits you have.
The text was updated successfully, but these errors were encountered: