Ensure old tokens continue to work by removing `token_key` #362

jmsmkn · 2024-08-02T08:31:26Z

This PR adds a test that ensures old tokens continue to work by removing token_key. See discussion in #358 and #356.

When each token had its own salt hash_token needed to be done per token with hash_token(token, auth_token.salt). token_key was introduced in 913336d so that the comparison could then be done on a smaller number of tokens.

However, the per-token salt was removed in 51a5204. That means that we can now move hash_token out of the filter loop of authenticate as we no longer have to pass auth_token.salt, we only pass the token sent by the user. That gives us the digest straight away, meaning that we can do a direct lookup on digest (the primary key, so unique) to find the relevant token.

This also has the benefit of improving performance as multiple hashes and comparisons no longer need to be made, and solves the problem of tokens being invalidated between versions 4.2.0 -> 5.0.0. Additionally, MAXIMUM_TOKEN_PREFIX_LENGTH is no longer bound by TOKEN_KEY_LENGTH so could be increased if you wish.

knox/models.py

knox/auth.py

jmsmkn · 2024-08-03T13:23:36Z

@giovannicimolin @johnraz This ended up being a bigger change than expected but this gives the project both a performance boost and solves the issue of invalidated tokens. It's my first contribution here so happy to address any feedback.

giovannicimolin · 2024-08-09T16:33:38Z

@jmsmkn Adding this to my review queue for next week. Thanks for the contribution and for the bug fix! 🚀

giovannicimolin · 2024-08-13T08:57:26Z

@jmsmkn Hey! I reviewed your PR and it's mostly looking good, but I wasn't able to properly test it because there's a few things messed up in the library right now. I'll get back to this and merge it as soon as I can figure out what's wrong.

Fireclunge · 2024-09-18T22:16:25Z

This would be great to add in as token invalidation is what's stopping me from upgrading to v5. Is this fix still viable?

mr-niche · 2024-10-09T14:54:36Z

Echoing @Fireclunge 's comment, the invalidation of existing tokens is a non-starter for us when upgrading, unfortunately.

tldr; it seems like make_hex_compatible from #272 is the root of the backwards incompatibility problem. A small tweak to this function would make it possible to upgrade from 4.2 -> 5+ without invalidating previously generated tokens! See below for the details and the proposed solution.

I agree with @jmsmkn 's assessment that token_key can be removed and the authentication logic can be simplified with a direct digest lookup. I want ahead and tested this upgrade path locally and still found that my pre-existing tokens (generated using 4.2.0) where considered invalid. It seems the root of the problem lies with the introduction of token prefixes in #272 - in particular, the make_hex_compatible() function makes a fundamental change about how tokens are hashed, leading to backwards incompatibility.

Before #272 , this is what hash_token looked like:

 def hash_token(token: str) -> str:
    """
    Calculates the hash of a token.
    Token must contain an even number of hex digits or
    a binascii.Error exception will be raised.
    """
    digest = hash_func()
    digest.update(binascii.unhexlify(token))
    return digest.hexdigest()

The important bit is the binascii.unhexlify(token) is a "deconstructed" bytes representation of the token.

> token = 'd45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'
> binascii.unhexlify(token)
b'\xd4[\x05\xeb\xc4\xbc\xe4\xf3e\xc0\x1c\x1e\x01\x00\xf0\x0c\xb079\x18M7J\xb8\xdf\xb1W\xe2}\xb8\xad\x8c'

In #272 , the addition of a token prefix resulted in an additional step of ensuring the token string is hex-compatible. This is reasonable, as tokens like example_3af32... would fail when being converted to bytes:

> binascii.unhexlify(f"example_{token}")
*** binascii.Error: Non-hexadecimal digit found

However, the actual implementation of make_hex_compatible() is a bit odd in that it just converts the token str to a bytes str:

def make_hex_compatible(token: str) -> bytes:
    """
    We need to make sure that the token, that is send is hex-compatible.
    When a token prefix is used, we cannot guarantee that.
    """
    return binascii.unhexlify(binascii.hexlify(bytes(token, 'utf-8')))

> token = 'd45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'
> make_hex_compatible(token)
b'd45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'

And with a token prefix present, it does work, but it's still just the bytes() version of the token str:

> token = 'd45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'
> make_hex_compatible(f"example_{token}")
b'example_d45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'

All that said, make_hex_compatible is equivalent to:

> bytes(token, 'utf-8')
b'd45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'
> bytes(f"example_{token}", 'utf-8')
b'example_d45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'

Again, this still works, because digest.update() will gladly hash the bytes string without issue. However, if we want to ensure version 4.2 tokens still work with token prefixes, we would have to maintain the original "deconstructed" bytes format that was being hashed. Otherwise, you'll get two different hashes for the same token!

> digest_1 = hash_func()  # hashlib.sha512
> digest_2 = hash_func()
> token = 'd45b05ebc4bce4f365c01c1e0100f00cb03739184d374ab8dfb157e27db8ad8c'
> digest_1.update(binascii.unhexlify(token))  # the original hash format in v4.2
> digest_2.update(make_hex_compatible(token))  # the new hash format in v5+
> digest_1.hexdigest()
'03fd15d542ded8d635cd7939020f44b813f796ada8f7d6b84dac87fe8e04ce891fcadfb800b31ccc0c0bbe14401c57cf4a250886203cc1187cabdef45705cb5f'
> digest_2.hexdigest()
'a78ad04bb016c54be0ddce4ae54d9476fb8b8d7cee0061631e561f6f915934b91627c0b15d438afb0ee3c231b2d92365fa3cdbc405d49a3738e6851235fcb4ff'

NOTE: This of course assumes you are using the default cryptography.hazmat.primitives.hashes.SHA512 in v4.2, and then use the new default hashlib.sha512 in v5+. Because salt is no longer used, these two algorithms should produce the same sha512 hash. Backwards compatibility is possible here!

(Sorry for the long reply, here's my proposed solution, which I think could slot nicely into this PR)

Update make_hex_compatible as follows:

def make_hex_compatible(token: str) -> bytes:
    """
    Ensure a token, which may contain a TOKEN_PREFIX, is hex-compatible before hashing.
    """
    try:
        # this supports tokens generated in v4.2 and any tokens which do not contain a TOKEN_PREFIX
        return binascii.unhexlify(token)
    except (binascii.Error, ValueError):
        # if a token has a prefix, encode it so that it's hex-compatible and can be hashed
        return binascii.hexlify(token.encode('utf-8'))

Adding this check should be lightweight performance-wise and it maintains backwards compatibility for existing tokens (while supporting newer tokens that might use the token_prefix option).

@giovannicimolin let me know what you think, happy to help support this effort in this PR (if @jmsmkn is interested?) or break it out separately. Also happy to help write tests and whatnot.

jmsmkn · 2024-10-09T15:09:49Z

Great find, @mr-niche! I'm unsure of the best way to proceed with this. Even though I was given the old "PRs welcome" treatment when I first raised the backward compatibility issue, it's been two months since I submitted a PR, and I haven't received any feedback. I spent a weekend working on improving the library, but without input from the maintainers, it's hard to move forward. If you'd like to make a PR to my branch to consolidate our efforts, I'd be happy to merge it. However, we ultimately need some guidance from the maintainers to get this resolved.

mr-niche · 2024-10-09T15:23:10Z

Thanks @jmsmkn ! I'll give @giovannicimolin some time to weigh in on how to proceed. If we get a thumbs up, I'll go ahead and submit a PR to your branch and we can get this ball rolling again. I'll also include some documentation around "Upgrading from v4 -> v5" so that folks who have raised this concern before (i.e. in #356 ) can see if the upgrade path is viable for their use case.

giovannicimolin · 2024-10-23T13:37:36Z

@jmsmkn @mr-niche Thanks for the contributions here so far, this is a really valuable contribution to the project.

In the last few weeks I haven't had enough time to follow-up on this project, I don't have a lot of bandwidth for this. I'll try to catch up before the end of the week.

giovannicimolin · 2024-10-26T09:45:10Z

@mr-niche Thanks for the great in-depth investigation of the issue!
I think it'll be great if we can get this shipped soon.

@jmsmkn Can you incorporate @mr-niche's changes into your PR and resolve the PR conflicts?

Let's make this the 5.1 version - lots of folks will be happy with the library being backwards compatible.

mr-niche · 2024-10-27T12:42:13Z

Thanks @giovannicimolin ! @jmsmkn , I'll take a stab at adding this to your PR, I have a little bit of time today.

jmsmkn · 2024-10-28T08:07:01Z

Thanks @giovannicimolin ! I have fixed the conflicts. One question: we could still keep around token_key as people may be using it as an identifier, even if it is no longer used internally. Let me know what you think.

giovannicimolin · 2024-10-28T16:50:21Z

@jmsmkn That makes sense, can you keep it around for now?

I'll take some time to test it out tomorrow. :)

mr-niche · 2024-10-30T12:57:25Z

@giovannicimolin my changes are in this PR (to be merged into @jmsmkn 's PR) jmsmkn#1

We need some guidance on one last consideration: if we switch back to the original unhexlify strategy, we will be breaking compatibility with any tokens generated in the 5.0.* versions, unfortunately.

I'm not sure what the right thing to do here is, we could either:

Make it clear that an update to 5.1.0 breaks 5.0.* tokens but restores compatibility to 4.2.0 (another breaking change 😬 )
Find a way to support both versions if users would like to, eventually migrating them over to the 5.0.* hashing style (see my comment here: Update make_hex_compatible to support existing tokens from v4.2.0 jmsmkn/django-rest-knox#1 (comment) )

Option 2 feels risky. But option 1 is yet another breaking change (albeit, a potentially smaller one - if folks have already upgraded to 5.0.*, they might not be as concerned about breaking changes, but I don't necessarily want to make that assumption).

Let me know what you think/if you have any other ideas here!

jmsmkn commented Aug 2, 2024

View reviewed changes

knox/models.py Outdated Show resolved Hide resolved

jmsmkn changed the title ~~Add test for old tokens~~ Ensure old tokens continue to work Aug 2, 2024

jmsmkn force-pushed the test_old_tokens branch from fe77acb to 2409c3a Compare August 2, 2024 10:40

jmsmkn mentioned this pull request Aug 2, 2024

docs(changelog): bump to 5.0.0, add token warning #358

Merged

Ensure old tokens continue to work

b75c2bd

jmsmkn force-pushed the test_old_tokens branch from 2409c3a to b75c2bd Compare August 2, 2024 10:50

Improve performance of token migration

28f9d48

jmsmkn commented Aug 3, 2024

View reviewed changes

knox/auth.py Outdated Show resolved Hide resolved

Remove token_key

c2b7f46

jmsmkn changed the title ~~Ensure old tokens continue to work~~ Ensure old tokens continue to work by removing token_key Aug 3, 2024

Remove filter loop as digest is a primary key

7034480

giovannicimolin mentioned this pull request Aug 13, 2024

AttributeError: 'SHA512' object has no attribute 'update' #364

Closed

Merge branch 'develop' into test_old_tokens

2fb0868

mr-niche mentioned this pull request Oct 28, 2024

Update make_hex_compatible to support existing tokens from v4.2.0 jmsmkn/django-rest-knox#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure old tokens continue to work by removing `token_key` #362

Ensure old tokens continue to work by removing `token_key` #362

jmsmkn commented Aug 2, 2024 •

edited

Loading

jmsmkn commented Aug 3, 2024 •

edited

Loading

giovannicimolin commented Aug 9, 2024

giovannicimolin commented Aug 13, 2024

Fireclunge commented Sep 18, 2024

mr-niche commented Oct 9, 2024

jmsmkn commented Oct 9, 2024

mr-niche commented Oct 9, 2024

giovannicimolin commented Oct 23, 2024

giovannicimolin commented Oct 26, 2024

mr-niche commented Oct 27, 2024

jmsmkn commented Oct 28, 2024

giovannicimolin commented Oct 28, 2024

mr-niche commented Oct 30, 2024

Ensure old tokens continue to work by removing token_key #362

Are you sure you want to change the base?

Ensure old tokens continue to work by removing token_key #362

Conversation

jmsmkn commented Aug 2, 2024 • edited Loading

jmsmkn commented Aug 3, 2024 • edited Loading

giovannicimolin commented Aug 9, 2024

giovannicimolin commented Aug 13, 2024

Fireclunge commented Sep 18, 2024

mr-niche commented Oct 9, 2024

jmsmkn commented Oct 9, 2024

mr-niche commented Oct 9, 2024

giovannicimolin commented Oct 23, 2024

giovannicimolin commented Oct 26, 2024

mr-niche commented Oct 27, 2024

jmsmkn commented Oct 28, 2024

giovannicimolin commented Oct 28, 2024

mr-niche commented Oct 30, 2024

Ensure old tokens continue to work by removing `token_key` #362

Ensure old tokens continue to work by removing `token_key` #362

jmsmkn commented Aug 2, 2024 •

edited

Loading

jmsmkn commented Aug 3, 2024 •

edited

Loading