Use kubernetes labels to exclude instances from the upgrade() cycle #333

preflightsiren · 2021-10-05T01:07:11Z

Is this a BUG REPORT or FEATURE REQUEST?: Feature

We have a workflow that allows workloads to run on a node much longer than the upgrade window of the node. There's an external process that will reap these workloads and nodes. Currently when upgrading these instancegroups, instance-manager will detect that there are still nodes that need to upgraded and rerun the upgrade process.

I would like to be able to label nodes instancemgr.keikoproj.io/exclude-upgrades: true or similar to skip evaluating nodes.

The text was updated successfully, but these errors were encountered:

preflightsiren · 2021-10-06T23:56:14Z

I even think this could work well using node.kubernetes.io/unschedulable allowing someone to cordon a node and exclude it from being included in the nodesReady check.

eytan-avisror · 2021-10-07T23:04:49Z

@preflightsiren are you referring to usage with native upgrade strategy? i.e. not using upgrade-manager/crd strategy?
Can you share more on the use-case, are you upgrading instancegroups while jobs are running and want those nodes to be skipped to not interrupt the job? when will those nodes eventually rotate? I think if you have some manual process to destroy those nodes this might be problematic to use node.kubernetes.io/unschedulable since it would affect everyone, but using a custom annotation might be more appopriate.

preflightsiren · 2021-10-10T03:01:42Z

Thanks @eytan-avisror we're actually using the custom resource for upgrades (workflows). The flow looks like this

patch the InstanceGroup; usually the image id.
the launch template is updated (by instance-manager)
Workflow is created
Workflow taints/cordons nodes
Workflow enters sleep/wait until all nodes are reaped
An external process restarts the workloads running on the old nodes
Now the old nodes are empty, cluster autoscaler will scale down the old nodes
Fin.

If the workflow is allowed to exit at step 5. Instance-manager will check that all nodes are running the latest version and try to resolve it. This issue is to try and find a mechanism to allow instance-manager to mark an upgrade as complete.

Custom labels were my first thought; this issue tries to reuse the existing patterns.

backjo · 2021-10-10T04:10:17Z

You should be able to make this work with the way things are today in instance-manager and upgrade-manager. Upgrade manager uses the eviction API to drain pods from a node. If those pods aren’t managed by a replication controller (deployment, replicaset etc) or if they have a relevant PDB, they should block the node from getting drained until the pods terminate or the PDB criteria are met.

…

On Sat, Oct 9, 2021 at 8:01 PM Sebastian Cole ***@***.***> wrote: Thanks @eytan-avisror <https://github.com/eytan-avisror> we're actually using the custom resource for upgrades (workflows). The flow looks like this 1. patch the InstanceGroup; usually the image id. 2. the launch template is updated (by instance-manager) 3. Workflow is created 4. Workflow taints/cordons nodes 5. Workflow enters sleep/wait until all nodes are reaped 6. An external process restarts the workloads running on the old nodes 7. Now the old nodes are empty, cluster autoscaler will scale down the old nodes 8. Fin. If the workflow is allowed to exit at step 5. Instance-manager will check that all nodes are running the latest version and try to resolve it. This issue is to try and find a mechanism to allow instance-manager to mark an upgrade as complete. Custom labels were my first thought; this issue tries to reuse the existing patterns. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#333 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AASUBUZTQ4OI4IR6TX6IJV3UGD62DANCNFSM5FKTE2UQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

preflightsiren · 2021-10-11T12:10:14Z

@backjo not sure I'm following your point. The pods are backed by deployments and statefulsets, but they can only be restarted at particular times (customer maint windows).

I'm not sure how that fits with the checks that all nodes are running the latest launch template, could you expand your thoughts?

backjo · 2021-11-04T12:30:40Z

Ah sorry @preflightsiren - didn't see this reply.

Basically, where I was going is that Pods that can't handle disruption can define PodDisruptionBudgets - which are respected by the upgrade controller. So if the disruption budget prevents a given pod from being evicted, the node will not end up getting terminated until that disruption budget or the pod naturally exits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use kubernetes labels to exclude instances from the upgrade() cycle #333

Use kubernetes labels to exclude instances from the upgrade() cycle #333

preflightsiren commented Oct 5, 2021

preflightsiren commented Oct 6, 2021

eytan-avisror commented Oct 7, 2021 •

edited

preflightsiren commented Oct 10, 2021

backjo commented Oct 10, 2021 via email

preflightsiren commented Oct 11, 2021

backjo commented Nov 4, 2021

Use kubernetes labels to exclude instances from the upgrade() cycle #333

Use kubernetes labels to exclude instances from the upgrade() cycle #333

Comments

preflightsiren commented Oct 5, 2021

preflightsiren commented Oct 6, 2021

eytan-avisror commented Oct 7, 2021 • edited

preflightsiren commented Oct 10, 2021

backjo commented Oct 10, 2021 via email

preflightsiren commented Oct 11, 2021

backjo commented Nov 4, 2021

eytan-avisror commented Oct 7, 2021 •

edited