Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate word(s) #7

Open
delitescere opened this issue Aug 30, 2020 · 8 comments
Open

Duplicate word(s) #7

delitescere opened this issue Aug 30, 2020 · 8 comments
Labels
Milestone

Comments

@delitescere
Copy link

"dusty" appears twice, maybe others? ¯_(ツ)_/¯

@DannyBen
Copy link

DannyBen commented Aug 30, 2020

Absolutely right...

There is more than one non-unique adjectives:

  • thirsty (2 times)
  • tense (2 times)
  • sore (2 times)
  • silent (2 times)
  • petite (2 times)
  • lively (2 times)
  • dusty (2 times)
  • cute (2 times)
  • bright (2 times)

The nouns list seems to be ok.

I confirmed that removing these duplicates will cause a different random result (tested on concode).

So, not sure what is the best course of action:

  1. Keep as is.
  2. Change and break the persistence of previous codes.

@jjmontesl
Copy link
Owner

jjmontesl commented Aug 31, 2020

It definitely will.

I think it's reasonable to introduce a backwards-compatibility breaking change. My proposal:

  • Instead of removing the adjectives, replace them by new adjectives and document this. This wouldn't break current hashes backwards, except for the 50% of the affected words.
  • Optionally, we can use a comparative/superlative version of the adjectives so it's easy to realize that they were formerly the duplicated ones (/ej thirsty/thristier, tense/tenser, sore/... I am unsure about some of them). I vote yes to this.
  • Optionally, keep the current list and offer a compatibility mode (I don't think this is necessarily good). I vote no to this.

What do you think?

@jjmontesl jjmontesl assigned jjmontesl and unassigned jjmontesl Aug 31, 2020
@jjmontesl jjmontesl added the bug label Aug 31, 2020
@DannyBen
Copy link

What do you think?

Well. I am always in favor of simple solutions that do not leave you with a "bad taste" and definitely ones that do not create additional or alternative technical debt.

So, my opinion is to do one of these:

  1. Option 1: Avoid the breaking change and leave as is.
  2. Option 2: Accept the breaking change and simply remove the duplicates.
  3. Option 3: Embrace the breaking change and remove the duplicates, and use this opportunity to add or update the dictionaries even further.

@akram-hameed
Copy link

Any further movement on this? I've ported Codenamize to Kotlin and while dealing with a bug caused by the white space in the 'ad hoc' adjective, I found a duplicate for dusty. I am not sure I can see how using the comparative/superlative would work in this case, since it means using additional characters which breaks part of the algorithm, no?

@jjmontesl
Copy link
Owner

jjmontesl commented Jan 6, 2021

I feel so bad for this bug.

After considering this again I'll vote for @DannyBen 's option 1 above.

This effectively reduces the hashing space and increases the chance of collisions a bit, but I think it's the least breaking change.

If this approach caused a limitation to any client, I propose to start working on option 3 and please open a new Issue in order to update the list of adjectives for a future codenamize hash version (removing duplicates and adding a few adjectives).

In any case, for compatibility with current hashes, we need to keep the current list as-is at least for the current version.

@jjmontesl
Copy link
Owner

jjmontesl commented Jan 6, 2021

(I'll keep this issue open for anyone hitting this until we settle on a new future codenamize hashing format).

@jjmontesl
Copy link
Owner

Please let me know of your Kotlin port when ready to link it from the home page. I'll also add notes to README and code to inform about the reduced hashing spaces and non-linear collision chances. Again, being a hashing library, though, I think that maintaining backwards compatibility is more important than reducing collision chances.

@jjmontesl jjmontesl added this to the 2.0 milestone Jan 6, 2021
@xkortex
Copy link

xkortex commented Apr 7, 2021

My 0.02: I use it primarily for human-readable identifiers for what is backed by full hashes elsewhere. The risk of collisions on short prefixes is decently high to begin with - I don't expect it to be collision resistant even on human scales.

I definitely think they should be changed at some point.

I don't really like the idea of keeping them in the same semantic-space, I'd pick wholly different ones. In fact you might want to check with a tool like editdistance for near-collisions.

Regardless of path taken, changing the word list is definitely a semver-major-breaking-change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants