Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SA: Implement schema and methods for (account, hostname) pausing #7490

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

beautifulentropy
Copy link
Member

@beautifulentropy beautifulentropy commented May 16, 2024

Add the storage implementation for our new (account, hostname) pair pausing feature.

  • Add schema and model for for the new paused table
  • Add getters and setters for interacting with the paused schema
  • Add exported SA methods for interacting with the getters and setters

This PR adds the functionality, logic will be implemented in the WFE and RA, respectively.

Part of #7406
Part of #7475

@beautifulentropy beautifulentropy marked this pull request as ready for review May 16, 2024 15:42
@beautifulentropy beautifulentropy requested a review from a team as a code owner May 16, 2024 15:42
Copy link
Contributor

@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

@beautifulentropy beautifulentropy marked this pull request as draft May 16, 2024 16:53
sa/model.go Outdated Show resolved Hide resolved
@beautifulentropy beautifulentropy marked this pull request as ready for review May 16, 2024 17:45
@letsencrypt letsencrypt deleted a comment from github-actions bot May 16, 2024
Copy link
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few high-level suggestions to simplify the new gRPC API methods:

  • Pause and Repause can be the same method, since I don't think the RA needs to know that a pair has been previously paused. The SA can determine which case it is in inside a transaction, and increment the relevant metric.
  • CheckPair[s]Paused can be a single method, no need for separate singular/plural implementations (when would we be calling the singular one?). And rather than taking a single identifier type and many values, it should take a repeated message Identifier { string type = 1; string value = 2 }. Let the SA handle the need for multiple separate db queries if necessary; no need to expose that deficiency of our database schema to the RA.
  • I'm not clear on why we need both UnpausePair and UnpauseAccount, since the current design doc only calls for ever unpausing whole accounts at a time.

Doing all three of these would reduce the new API surface to just three methods: CheckPairsPaused, PausePair, UnpauseAccount.

There's one more method which might be necessary: GetPausedIdentifiersForAccount, to be used by the self-service unpausing page, to populate it with a (truncated) list of identifiers that will be unpaused.

sa/model.go Outdated Show resolved Hide resolved
@beautifulentropy
Copy link
Member Author

beautifulentropy commented May 21, 2024

  • Pause and Repause can be the same method, since I don't think the RA needs to know that a pair has been previously paused. The SA can determine which case it is in inside a transaction, and increment the relevant metric.

On its face I think this is a good idea. However, it was implemented this way because I had always assumed that these observations would be confined to the RA. I'm a little uncomfortable with the idea of emitting metrics dependant on business logic inside of the storage layer.

  • CheckPair[s]Paused can be a single method, no need for separate singular/plural implementations (when would we be calling the singular one?). And rather than taking a single identifier type and many values, it should take a repeated message Identifier { string type = 1; string value = 2 }. Let the SA handle the need for multiple separate db queries if necessary; no need to expose that deficiency of our database schema to the RA.

Fair. I'll fix this.

  • I'm not clear on why we need both UnpausePair and UnpauseAccount, since the current design doc only calls for ever unpausing whole accounts at a time.

You're right, UnpausePair was an oversight on my part.

There's one more method which might be necessary: GetPausedIdentifiersForAccount, to be used by the self-service unpausing page, to populate it with a (truncated) list of identifiers that will be unpaused.

Thanks, I'll add this.

@aarongable
Copy link
Contributor

On its face I think this is a good idea. However, it was implemented this way because I had always assumed that these observations would be confined to the RA. I'm a little uncomfortable with the idea of emitting metrics dependant on business logic inside of the storage layer.

I totally agree with this line of thinking. But for me it's a balancing act. I think emitting the "repaused" metric from the SA is slightly unfortunate, because it does feel like an RA sort of thing to be measuring. But I think that exposing twice as many methods from the SA and having strict calling conventions for those methods that will fail if the RA is ever wrong (or races against another RA!) is significantly more unfortunate. Requiring every potential SA client to have identical "call CheckPairsPaused, then depending on its answer, either call Pause or Repause, and gracefully handle the error in case the situation has changed between those two calls" logic feels much worse than allowing SA clients to just say "hey, pause this one" and letting the SA handle the intricacies of making that idempotent inside a database transaction.

@beautifulentropy
Copy link
Member Author

On its face I think this is a good idea. However, it was implemented this way because I had always assumed that these observations would be confined to the RA. I'm a little uncomfortable with the idea of emitting metrics dependant on business logic inside of the storage layer.

I totally agree with this line of thinking. But for me it's a balancing act. I think emitting the "repaused" metric from the SA is slightly unfortunate, because it does feel like an RA sort of thing to be measuring. But I think that exposing twice as many methods from the SA and having strict calling conventions for those methods that will fail if the RA is ever wrong (or races against another RA!) is significantly more unfortunate. Requiring every potential SA client to have identical "call CheckPairsPaused, then depending on its answer, either call Pause or Repause, and gracefully handle the error in case the situation has changed between those two calls" logic feels much worse than allowing SA clients to just say "hey, pause this one" and letting the SA handle the intricacies of making that idempotent inside a database transaction.

I agree and as long as I've got support from you I'm happy to implement this as described above.

@pgporada pgporada self-requested a review May 22, 2024 14:19
// provided account. If no matches are found, an empty slice is returned. If a
// non-DNS identifier is provided, an error is returned.
func checkIdentifiersPaused(ctx context.Context, dbs db.Selector, regID int64, ids pauseIds) ([]string, error) {
if len(ids) == 0 {
Copy link
Member

@pgporada pgporada May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ids is a []pauseId so this could just be a nil check because a slice zero value is nil. @aarongable brought up a good point in a different PR that this catches the case where the []pauseids slice exists but has an obviously-invalid (zero) length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta question, what is the most safe/correct way of checking? https://go.dev/play/p/m4QpBnr6DP5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idiomatic thing to do for slices is to check their length, because it is a single highly-readable check that works on both nil and non-nil-but-empty slices.

func checkIdentifiersPaused(ctx context.Context, dbs db.Selector, regID int64, ids pauseIds) ([]string, error) {
if len(ids) == 0 {
// No identifier values to check.
return []string{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could return nil, nil here since the function returns []string anyways and you save allocating an slice containing nothing.


// pausedModel represents a row in the paused table. The pausedAt and unpausedAt
// fields are pointers because they are NULL-able columns. At no point should
// both pausedAt and unpausedAt be non-NULL. Valid states are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At no point should both pausedAt and unpausedAt be non-NULL.

How does this square with the example directly below it?

// - identifier unpaused: pausedAt is non-NULL, unpausedAt is non-NULL


// pausedModel represents a row in the paused table. The pausedAt and unpausedAt
// fields are pointers because they are NULL-able columns. At no point should
// both pausedAt and unpausedAt be non-NULL. Valid states are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// both pausedAt and unpausedAt be non-NULL. Valid states are:
// both pausedAt and unpausedAt be NULL. Valid states are:

// checkIdentifiersPaused takes a slice of identifiers of type "dns" and returns
// a slice of the first 15 identifier values which are currently paused for the
// provided account. If no matches are found, an empty slice is returned. If a
// non-DNS identifier is provided, an error is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, for future readers, we should note that this decision to reject non-DNS identifiers is not about the existence of non-DNS identifiers themselves, but rather about the fact that we don't (yet) want this function to eat the complexity of supporting more than one identifier type at a time.

Comment on lines +44 to +45
rpc CheckIdentifiersPaused (CheckIdentifiersPausedRequest) returns (Hostnames) {}
rpc GetPausedIdentifiersForAccount (RegistrationID) returns (Hostnames) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have these return Identifiers? We're being flexible enough to support taking different kinds of identifiers as input, we should do the same for results.

@@ -96,6 +100,8 @@ service StorageAuthority {
rpc UpdateRevokedCertificate(RevokeCertificateRequest) returns (google.protobuf.Empty) {}
rpc LeaseCRLShard(LeaseCRLShardRequest) returns (LeaseCRLShardResponse) {}
rpc UpdateCRLShard(UpdateCRLShardRequest) returns (google.protobuf.Empty) {}
rpc PauseIdentifier(PauseIdentifierRequest) returns (PauseIdentifierResponse) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truly a question: would it make sense for this to take a plural set of identifiers to pause? If the RA detects that they've hit the limit for many names at the same time (which wouldn't surprise me, in the median case of requesting foo.co and www.foo.co) then we could pause all of them with a single request.

// provided account. If no matches are found, an empty slice is returned. If a
// non-DNS identifier is provided, an error is returned.
func checkIdentifiersPaused(ctx context.Context, dbs db.Selector, regID int64, ids pauseIds) ([]string, error) {
if len(ids) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idiomatic thing to do for slices is to check their length, because it is a single highly-readable check that works on both nil and non-nil-but-empty slices.

Comment on lines +1403 to +1406
func (ssa *SQLStorageAuthority) GetSerialsByAccount(req *sapb.RegistrationID, stream sapb.StorageAuthority_GetSerialsByAccountServer) error {
return ssa.SQLStorageAuthorityRO.GetSerialsByAccount(req, stream)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapper has been erroneously re-added, and the similar wrappers below are no longer necessary, thanks to #7501

// a slice of the first 15 identifier values which are currently paused for the
// provided account. If no matches are found, an empty slice is returned. If a
// non-DNS identifier is provided, an error is returned.
func checkIdentifiersPaused(ctx context.Context, dbs db.Selector, regID int64, ids pauseIds) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function (and the similar complex helper functions below) in this file? These are each called in only one place, and they have no direct tests (all of the tests operate at the SA level). Why not inline their logic directly into the SA methods? This would also prevent having to deal with the annoyance of ensuring the executor they're called with is actually a transaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants