-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add update reconciliation for migration job #45
Conversation
02e50f9
to
80a528b
Compare
// deletes the Job, so it can be recreated on the following reconciliation. | ||
// k8s Jobs are immutable and need to be recreated if they are to be re-run. | ||
if foundMigrationJob.GetDeletionTimestamp() == nil { | ||
log.Info("Deleting existing migration Job", "name", foundMigrationJob.Name) | ||
if err = r.Delete(ctx, foundMigrationJob); err != nil { | ||
condition = conditions.NotDeleted(objName, err) | ||
return &ctrl.Result{}, err | ||
} | ||
condition = conditions.Deleted(objName) | ||
} else { | ||
log.Info("Waiting for migration Job to be deleted, so it can be recreated", "name", desiredMigrationJob.Name) | ||
} | ||
return &ctrl.Result{Requeue: true}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no "waiting" in the operator pattern, so we have to make this reconciliation handler idempotent. so:
- if we detect that an update (actually, a replacement) is needed
- we'll take the existing Job resource, see if it's been marked for deletion -> if not, delete it
- we can't be sure of how many reconciliations it will take, so we'll assume it takes more than 1
- we'll continue to requeue the reconciliation job until the
foundMigrationJob
is cleaned up, so that the very next run will create a new Job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2eccf47
to
b53bbaf
Compare
_, _, desiredMigrationJob = controllerReconciler.prefectServerDeployment(prefectserver) | ||
migrateJob = &batchv1.Job{} | ||
Eventually(func() error { | ||
return k8sClient.Get(ctx, types.NamespacedName{ | ||
Namespace: namespaceName, | ||
Name: desiredMigrationJob.Name, | ||
}, migrateJob) | ||
}).Should(Succeed()) | ||
migrateJob.Status.Succeeded = 1 | ||
Expect(k8sClient.Status().Update(ctx, migrateJob)).To(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of annoying, but looking up a Job
by name means i have to re-run the manifest generator every time to create the hashed name, which is based on the manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I remember thinking about using .metadata.generation
from the CR since that should be unique and change each time the CR changes. But not sure if there are any drawbacks to that approach.
job.Spec.Template.Spec.Containers[0].Image = "image1:v2" | ||
hash2, err := Hash(job.Spec, 8) | ||
Expect(err).NotTo(HaveOccurred()) | ||
|
||
Expect(hash1).NotTo(Equal(hash2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noice
return &batchv1.Job{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Namespace: server.Namespace, | ||
Name: fmt.Sprintf("%s-migration-%s", server.Name, hashSuffix), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may eventually want to trim this name (and the name of all resources, I guess) to 63 characters in case the server.Name
provided is unusually long for some reason. Or we can let it fail and we'll know why - just calling it out as a familiar speedbump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call out. might make sense to validate a char limit for the .Name on the CR
#56
case !isMigrationJobFinished(foundMigrationJob): | ||
log.Info("Waiting on active migration Job to complete", "name", foundMigrationJob.Name) | ||
condition = conditions.AlreadyExists(objName, fmt.Sprintf("migration Job %s is still active", foundMigrationJob.Name)) | ||
return &ctrl.Result{Requeue: true}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From memory, this will immediately requeue the reconcile loop. If we find that it ends up looping a ton while waiting for these to finish (I'm not sure how long they usually take), we can pass another field like this:
return &ctrl.Result{Requeue,: true, RequeueAfter: 60 * time.Seconds}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah - good callout. it's probably a good idea to add a time buffer in so it doesn't spam the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dig it! This will get even cleaner when we give it the CreateOrUpdate
treatment in a future PR
resolves #31
migration
Job
s will be handled differently in that anytime a newJob
spec is generated, we'll create a new, distinctJob
resource. This append-only operation allows us to not worry about Deletes, asJob
resources (usually) need to be replaced if there are updates to its spec