-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Usually once" promotions #214
base: main
Are you sure you want to change the base?
Commits on Dec 21, 2023
-
Add promotion status to API types
- promotion status goes in the environment status, and is optional since it might not be happening - revision gives the logical point in time that the record applies to, so you can judge whether it's relevant or not - lastAttemptedTime is informational - succeeded is true if it's done, false if it needs to be retried - PR details (optional, fill out if it's a PR)
Configuration menu - View commit details
-
Copy full SHA for d2c3c03 - Browse repository at this point
Copy the full SHA d2c3c03View commit details -
Verify pipeline has promotion status
This is the least test that pipeline fills in promotion status (and it doesn't pass, yet).
Configuration menu - View commit details
-
Copy full SHA for 2e031fd - Browse repository at this point
Copy the full SHA 2e031fdView commit details -
Generalise the "verify promotion" assertion
This way it can be used to check an invariant.
Configuration menu - View commit details
-
Copy full SHA for d248d75 - Browse repository at this point
Copy the full SHA d248d75View commit details -
This commit records what happens when a promotion is attempted, in the pipeline status. This is enough to get the test to pass. Two slightly wider changes are included here: - be more careful about checking whether a particular environment status exists, when looping through them; - don't start with an empty map for environment status. This used to be reasonable, since we were building the whole picture up again by looking at the targets; now, it would lose the promotion status, which can't be observed each time. Signed-off-by: Michael Bridgen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f852a07 - Browse repository at this point
Copy the full SHA f852a07View commit details -
Check that a promotion isn't invoked repeatedly
If you've set up promotions via PR, or CI, or (hypothetical) chatbot, there will be a human-scale lag between a promotion being triggered and it taking effect in the target environment. If the controller processes the pipeline during that time, it shouldn't trigger the promotion again. This commit expands the mock promotion machinery into a little state machine, so the gap between promotion trigger and completion can be explicit in test cases. It asserts that no promotion of {env, version} will be attempted twice, which (_provided_ we don't roll any environment back a version) is what we want to prove here. There's a race between the cache getting an object with the promotion status, and the reconciler processing the object again. It's possible for the reconciler, having just triggered a promotion, to run again and see a version of the pipeline object that doesn't have the promotion recorded -- and that will mean attempting the promotion again. In testing, this happens almost always, because patches will cause the object to be requeued before it's finished reconciling, and it'll run again as soon as its exited. I would expect it to also be common in normal operation; and the outcome is that a promotion is re-run, which is not great! To mitigate the race, I've added a predicate to the watch which will drop updates that didn't change the spec. This means that the patches earlier in Reconcile won't cause the object to be requeued, and it is much, much less likely that it'll run again straight away. It's still possible -- maybe the spec changed while it was running. Other, more involved ways to avoid it: - use conflict detection on write to bail before running a promotion, if the observed pipeline has changed - create Promotion objects and guard promotions by trying to create (another form of detecting conflict on write)
Configuration menu - View commit details
-
Copy full SHA for 9244c67 - Browse repository at this point
Copy the full SHA 9244c67View commit details -
Requeue when the pipeline isn't ready
If a cache has not yet synced, or a target is not to be found, and the controller has incomplete information, then the safest thing to do is not make any decisions on promotions, and instead requeue the pipeline for another try later. This fixes a problem that was exposed by adding a predicate to the watch on Pipeline: if a cluster cache isn't ready when the controller looks, and the target doesn't exist, there's nothing to trigger another go-round. This presented itself as a problem with the remote and Kustomization tests, which explicitly check for a "not found" target status. (I fixed another problem with those tests, in passing -- if they are going to check for an updated target status, they had better fetch the object again.) Signed-off-by: Michael Bridgen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ed4cfb2 - Browse repository at this point
Copy the full SHA ed4cfb2View commit details