Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add OrgAuthToken model #50409

Merged
merged 2 commits into from Jun 14, 2023
Merged

feat: add OrgAuthToken model #50409

merged 2 commits into from Jun 14, 2023

Conversation

mydea
Copy link
Member

@mydea mydea commented Jun 6, 2023

This adds a new OrgAuthToken model:

  • Tied to an organization
  • Has a token which is expected to be a JWT token
  • Add two basic utilities for token generation/parsing
  • It is possible to assign project(s) to a token

ref #50144

based on RFC getsentry/rfcs#91

@mydea mydea self-assigned this Jun 6, 2023
@mydea mydea requested a review from a team as a code owner June 6, 2023 12:20
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 6, 2023
"sentry.Project", related_name="projects", through=OrgAuthTokenProject
)
# max JWT token length is 2048, +7 for our custom prefix (sntrys_)
token = models.CharField(max_length=2055, unique=True, null=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure what I should make the max length here. on the RFC there is a comment that we should store this in hashed form, if we do this I guess we need a more generic length here...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitsuhiko why would we need to store it hashed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks! I supposed it was for security reasons then it makes sense.

Here we should make the length, the length of the hash. I don't expect we will need to store the token in any other format besides the hashed one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on storing the encoded JWT? or only the important claims within the token?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to be explicit that we store token_hashed in the DB.

@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

Merging #50409 (7518568) into master (12d0ad9) will increase coverage by 1.01%.
The diff coverage is 87.87%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #50409      +/-   ##
==========================================
+ Coverage   80.14%   81.15%   +1.01%     
==========================================
  Files        4843     4845       +2     
  Lines      203733   203799      +66     
  Branches    11130    11130              
==========================================
+ Hits       163275   165401    +2126     
+ Misses      40212    38152    -2060     
  Partials      246      246              
Impacted Files Coverage Δ
src/sentry/models/orgauthtoken.py 86.66% <86.66%> (ø)
src/sentry/utils/security/orgauthtoken_jwt.py 90.00% <90.00%> (ø)
src/sentry/models/__init__.py 100.00% <100.00%> (ø)

... and 113 files with indirect coverage changes

@mydea
Copy link
Member Author

mydea commented Jun 7, 2023

Note: I put these new models into the control silo, mirroring the api token model, I think that is the correct place?



@control_silo_only_model
class OrgAuthTokenProject(Model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put this model in orgauthtoken.py since it belongs to the same cluster of models related to org auth tokens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after talking this through with @mitsuhiko , we decided to skip the projects for now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the nature of the relationship, it's not a big deal at all for the db.

"sentry.Project", related_name="projects", through=OrgAuthTokenProject
)
# max JWT token length is 2048, +7 for our custom prefix (sntrys_)
token = models.CharField(max_length=2055, unique=True, null=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitsuhiko why would we need to store it hashed?

# max JWT token length is 2048, +7 for our custom prefix (sntrys_)
token = models.CharField(max_length=2055, unique=True, null=False)
label = models.CharField(max_length=255, null=False)
scope_list = ArrayField(of=models.TextField)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we statically type the list of scopes? Having such a generic field is a bit suspicious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reference, this is how the scope list is currently stored in other models. But we may improve on this here :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible i would like to bound it list of elements, maybe like:

from django.db import models
from django.core.exceptions import ValidationError

class MyModel(models.Model):
    OPTIONS = [
        ('option1', 'Option 1'),
        ('option2', 'Option 2'),
        ('option3', 'Option 3'),
    ]

    choices_array = models.ArrayField(
        models.TextField(),
        validators=[],
    )

    def validate_choices_array(value):
        for choice in value:
            if choice not in dict(MyModel.OPTIONS).keys():
                raise ValidationError(f"{choice} is not a valid choice.")

    choices_array.validators.append(validate_choices_array)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I implemented this!

label = models.CharField(max_length=255, null=False)
scope_list = ArrayField(of=models.TextField)

user_added = FlexibleForeignKey("sentry.User", null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user_added refers to the User that created this token? If so, I would call this user or added_by.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the idea - I'll rename it to added_by (IMHO user is a potentially confusing as it may imply this token belongs to that user, which it doesn't).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's a good point!


user_added = FlexibleForeignKey("sentry.User", null=True)
date_added = models.DateTimeField(default=timezone.now, null=False)
date_last_used = models.DateTimeField(null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a null field, we could have it default to timezone.now and update it when the user uses it, but this changes the semantics. I don't know which ones are more representative of the situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's useful to know that the token was never used, I'd say?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could technically infer it if the two dates are equal but ofc it's not the best way. My main concern was that having nullability in the db is not very nice but I understand your point.

src/sentry/utils/security/orgauthtoken_jwt.py Outdated Show resolved Hide resolved
src/sentry/utils/security/orgauthtoken_jwt.py Outdated Show resolved Hide resolved
"sentry.Project", related_name="projects", through=OrgAuthTokenProject
)
# max JWT token length is 2048, +7 for our custom prefix (sntrys_)
token = models.CharField(max_length=2055, unique=True, null=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on storing the encoded JWT? or only the important claims within the token?

Comment on lines 42 to 47
project_last_used = HybridCloudForeignKey(
"sentry.Project", null=True, blank=True, on_delete="cascade"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want cascading here? If the last used project is deleted should the org token be deleted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point @markstory, I missed this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mydea we can just put the project id here as a BoundedBigIntegerField and forget about possible accidental cascading deletions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, true! So it should def. not cascade, but should we use HubridCloudForeignKey or BoundedBigIntegerField then? Happy to use BoundedBigIntegerField, just wanting to make sure to get the hybrid could stuff right...!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW also created_by should not cascade but set to null as well, adjusted that too!)


def generate_token(org_slug: str):
jwt_payload = {
"iss": "sentry.io",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be use even when the instance is self-hosted/single-tenant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, valid question... took this from here: https://github.com/getsentry/rfcs/pull/91/files#diff-3109d1f30b4b81085e85d841deafefb0d1b43cc5345c64514475512a4ce9fdeeR94

IMHO it should be something consistent that says "this is from sentry", and not differ when it is self-hosted/single-tenant. Not sure if sentry.io is the best value for this then, but I'd say it's fine, maybe (just "Sentry" may also be ambiguous, ...)

"iat": datetime.utcnow(),
"nonce": uuid4().hex,
"sentry_site": settings.SENTRY_OPTIONS["system.url-prefix"],
"sentry_org": org_slug,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll also need the organizations 'root' URL as in the future, we'll have multiple regions (with different API urls). While customer traffic can continue using https://sentry.io/api/0/** there will be latency overhead (and cost to us) so we should encourage using regional domains instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this different from "sentry_site": settings.SENTRY_OPTIONS["system.url-prefix"],? that is what this is designed to do (but honestly I am not 100% sure what fields are what there, @mitsuhiko pointed me to this settings field)

So to be clear, the only point of the sentry_site field is exactly what you mentioned, if we should store something else in there, happy to be pointed to it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, organizations will have a couple URLs:

  • sentryUrl which is our root domain and is the same as system.url-prefix. ex. sentry.io
  • organizationUrl which is domain for the organization's UI. ex acme.sentry.io
  • regionUrl Which is the domain that the organization's API endpoints are available on. ex us.sentry.io

Any API requests sent to sentryUrl will be routed to the correct region, but will suffer a latency penalty (because we're proxying the request to the region) and the request will go to the US (this can matter for EU customers).

For this scenario, I think you'll want both the sentryUrl and regionUrl available in the JWT. There are a small number of API endpoints that are only available on sentryUrl (users, integrations, sentry apps).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do I get regionUrl from? (I left a comment in the RFC as well to make sure this is aligned - I'd go with sentry_url and sentry_region_url then)

Comment on lines 42 to 47
project_last_used = HybridCloudForeignKey(
"sentry.Project", null=True, blank=True, on_delete="cascade"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
project_last_used = HybridCloudForeignKey(
"sentry.Project", null=True, blank=True, on_delete="cascade"
)
project_last_used = BoundedBigIntegerFieldd(db_index=True)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this preferred over the HybridCouldForeignKey? 🤔 Feels like that is designed for exactly this use case, but honestly not entirely sure... 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option would be to use HybridCloudForeignKey("sentry.Project", null=True, blank=True, on_delete="SET_NULL")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the expertise to judge what is better, since I lack in-depth knowledge of hybrid cloud. @markstory could you give an input here, thanks!

Copy link
Contributor

@AniketDas-Tekky AniketDas-Tekky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my comment from the issue:
#50144 (comment)

Basically, is there a tech spec for this? And have we properly investigated what introducing a new token type means?

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

This PR has a migration; here is the generated SQL for src/sentry/migrations/0483_add_orgauthtoken.py ()

--
-- Create model OrgAuthToken
--
CREATE TABLE "sentry_orgauthtoken" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "token_hashed" varchar(3000) NOT NULL UNIQUE, "token_last_characters" varchar(4) NULL, "name" varchar(255) NOT NULL, "scope_list" text[] NULL, "date_added" timestamp with time zone NOT NULL, "date_last_used" timestamp with time zone NULL, "project_last_used_id" bigint NULL, "date_deactivated" timestamp with time zone NULL, "created_by_id" integer NULL);
ALTER TABLE "sentry_orgauthtoken" ADD CONSTRAINT "sentry_orgauthtoken_created_by_id_5e2288f9_fk_auth_user_id" FOREIGN KEY ("created_by_id") REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "sentry_orgauthtoken" VALIDATE CONSTRAINT "sentry_orgauthtoken_created_by_id_5e2288f9_fk_auth_user_id";
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_organization_id_17552a60" ON "sentry_orgauthtoken" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_token_hashed_4c36306d_like" ON "sentry_orgauthtoken" ("token_hashed" varchar_pattern_ops);
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_project_last_used_id_0b9aa639" ON "sentry_orgauthtoken" ("project_last_used_id");
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_created_by_id_5e2288f9" ON "sentry_orgauthtoken" ("created_by_id");

@mydea
Copy link
Member Author

mydea commented Jun 9, 2023

Copying my comment from the issue: #50144 (comment)

Basically, is there a tech spec for this? And have we properly investigated what introducing a new token type means?

Hey, there is an RFC getsentry/rfcs#91 by @mitsuhiko. If you have any concerns about this on a fundamental level, that's the place to discuss this, I guess. My understanding is that the basic decision that we want to add new org-level tokens has been taken, but @mitsuhiko can probably give you more information if you have more fundamental questions!

Copy link
Member

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR LGTM but I can't judge the usage of the HybridCloudForeignKey. The only thing I can say is that semantically it makes sense.

src/sentry/utils/security/orgauthtoken_jwt.py Outdated Show resolved Hide resolved
src/sentry/utils/security/orgauthtoken_jwt.py Outdated Show resolved Hide resolved
Copy link
Member

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR LGTM but I can't judge the usage of the HybridCloudForeignKey. The only thing I can say is that semantically it makes sense.

pyproject.toml Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/0487_add_orgauthtoken.py ()

--
-- Create model OrgAuthToken
--
CREATE TABLE "sentry_orgauthtoken" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "token_hashed" text NOT NULL UNIQUE, "token_last_characters" varchar(4) NULL, "name" varchar(255) NOT NULL, "scope_list" text[] NULL, "date_added" timestamp with time zone NOT NULL, "date_last_used" timestamp with time zone NULL, "project_last_used_id" bigint NULL, "date_deactivated" timestamp with time zone NULL, "created_by_id" integer NULL);
ALTER TABLE "sentry_orgauthtoken" ADD CONSTRAINT "sentry_orgauthtoken_created_by_id_5e2288f9_fk_auth_user_id" FOREIGN KEY ("created_by_id") REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "sentry_orgauthtoken" VALIDATE CONSTRAINT "sentry_orgauthtoken_created_by_id_5e2288f9_fk_auth_user_id";
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_organization_id_17552a60" ON "sentry_orgauthtoken" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_token_hashed_4c36306d_like" ON "sentry_orgauthtoken" ("token_hashed" text_pattern_ops);
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_project_last_used_id_0b9aa639" ON "sentry_orgauthtoken" ("project_last_used_id");
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_created_by_id_5e2288f9" ON "sentry_orgauthtoken" ("created_by_id");

@github-actions
Copy link
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/0488_add_orgauthtoken.py ()

--
-- Create model OrgAuthToken
--
CREATE TABLE "sentry_orgauthtoken" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "token_hashed" text NOT NULL UNIQUE, "token_last_characters" varchar(4) NULL, "name" varchar(255) NOT NULL, "scope_list" text[] NULL, "date_added" timestamp with time zone NOT NULL, "date_last_used" timestamp with time zone NULL, "project_last_used_id" bigint NULL, "date_deactivated" timestamp with time zone NULL, "created_by_id" integer NULL);
ALTER TABLE "sentry_orgauthtoken" ADD CONSTRAINT "sentry_orgauthtoken_created_by_id_5e2288f9_fk_auth_user_id" FOREIGN KEY ("created_by_id") REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "sentry_orgauthtoken" VALIDATE CONSTRAINT "sentry_orgauthtoken_created_by_id_5e2288f9_fk_auth_user_id";
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_organization_id_17552a60" ON "sentry_orgauthtoken" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_token_hashed_4c36306d_like" ON "sentry_orgauthtoken" ("token_hashed" text_pattern_ops);
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_project_last_used_id_0b9aa639" ON "sentry_orgauthtoken" ("project_last_used_id");
CREATE INDEX CONCURRENTLY "sentry_orgauthtoken_created_by_id_5e2288f9" ON "sentry_orgauthtoken" ("created_by_id");

@mydea mydea dismissed AniketDas-Tekky’s stale review June 14, 2023 11:50

I think this has been addressed!

@mydea mydea merged commit fdc8f5a into master Jun 14, 2023
54 of 55 checks passed
@mydea mydea deleted the fn/org-auth-token branch June 14, 2023 12:17
@github-actions github-actions bot locked and limited conversation to collaborators Jun 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Impact: Migration Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants