Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use kubernetes labels to exclude instances from the upgrade() cycle #333

Open
preflightsiren opened this issue Oct 5, 2021 · 6 comments

Comments

@preflightsiren
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?: Feature

We have a workflow that allows workloads to run on a node much longer than the upgrade window of the node. There's an external process that will reap these workloads and nodes. Currently when upgrading these instancegroups, instance-manager will detect that there are still nodes that need to upgraded and rerun the upgrade process.

I would like to be able to label nodes instancemgr.keikoproj.io/exclude-upgrades: true or similar to skip evaluating nodes.

@preflightsiren
Copy link
Contributor Author

I even think this could work well using node.kubernetes.io/unschedulable allowing someone to cordon a node and exclude it from being included in the nodesReady check.

@eytan-avisror
Copy link
Collaborator

eytan-avisror commented Oct 7, 2021

@preflightsiren are you referring to usage with native upgrade strategy? i.e. not using upgrade-manager/crd strategy?
Can you share more on the use-case, are you upgrading instancegroups while jobs are running and want those nodes to be skipped to not interrupt the job? when will those nodes eventually rotate? I think if you have some manual process to destroy those nodes this might be problematic to use node.kubernetes.io/unschedulable since it would affect everyone, but using a custom annotation might be more appopriate.

@preflightsiren
Copy link
Contributor Author

Thanks @eytan-avisror we're actually using the custom resource for upgrades (workflows). The flow looks like this

  1. patch the InstanceGroup; usually the image id.
  2. the launch template is updated (by instance-manager)
  3. Workflow is created
  4. Workflow taints/cordons nodes
  5. Workflow enters sleep/wait until all nodes are reaped
  6. An external process restarts the workloads running on the old nodes
  7. Now the old nodes are empty, cluster autoscaler will scale down the old nodes
  8. Fin.

If the workflow is allowed to exit at step 5. Instance-manager will check that all nodes are running the latest version and try to resolve it. This issue is to try and find a mechanism to allow instance-manager to mark an upgrade as complete.

Custom labels were my first thought; this issue tries to reuse the existing patterns.

@backjo
Copy link
Collaborator

backjo commented Oct 10, 2021 via email

@preflightsiren
Copy link
Contributor Author

@backjo not sure I'm following your point. The pods are backed by deployments and statefulsets, but they can only be restarted at particular times (customer maint windows).

I'm not sure how that fits with the checks that all nodes are running the latest launch template, could you expand your thoughts?

@backjo
Copy link
Collaborator

backjo commented Nov 4, 2021

Ah sorry @preflightsiren - didn't see this reply.

Basically, where I was going is that Pods that can't handle disruption can define PodDisruptionBudgets - which are respected by the upgrade controller. So if the disruption budget prevents a given pod from being evicted, the node will not end up getting terminated until that disruption budget or the pod naturally exits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants