Allow partial consolidation of nodes with blocking PDBs #1176

hintofbasil · 2024-04-08T21:49:39Z

Description

What problem are you trying to solve?

We run clusters with relatively large nodes (~100 pods), significant autoscaling of deployments, and some slow pod start up/termination times.
After we update a Nodepool we monitor the status of the nodes and wait for all drifted nodes to be replaced as part of our continuous delivery (CD) pipeline. This makes the speed Karpenter removes drifted nodes from our clusters significant as it can block our CD pipelines for long periods of time.

Currently Karpenter will only consolidate a drifted node if all the pods are evictable (no PDBs are blocking) (source).
Karpenter also runs one disruption command at a time which can lead to the duration between drift reconciliations being many minutes.

This often results in a situation where Karpenter checks too infrequently and the odds of a single PDB blocking consolidation are so high that it takes many hours for large nodes to be valid consolidation candidates. It would be preferable for us if we could drain all the pods which are currently evictable and then wait for PDBs to allow the eviction of the remaining pods.

This will require changes to the API to configure this. The sensible place would be adding it to the disruption section of the NodePool spec keeping it in line with the Disruption Controls design.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    evictionPolicy: FullNodeOnly || BestEffort  # FullNodeOnly requires all pods to be evictable

How important is this feature to you?

We currently have a hacky work-a-round where we kubectl drain drifted nodes as kubectl drain does evicts pods as they become evictable. We would prefer a proper implementation in the controller.

I am interested on working on this if there is agreement the feature is useful and we can agree on the API.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

jmdeal · 2024-04-09T21:55:40Z

I think this ties into some of the discussion that's been taking place in #1047. I definitely see the use case here and think this falls under a broader story we've been discussing: ensuring cluster operators have mechanisms to disrupt nodes regardless of pod level configuration (PDBs, the karpenter.sh/do-not-disrupt annotation, etc).

Making sure I understand your use case, you want drifted nodes to be immediately eligible to begin draining. The drain should occur without violating PDBs. It is acceptable for nodes to remain on the cluster partially drained for an indefinite period of time to ensure this. Does this sound about right?

I want to draw particular attention to that last point, it being acceptable for nodes to remain partially drained indefinitely. This has been something we've been hesitant to enable, hence the current behavior where we wait until we know the node can be disrupted successfully. One option I posed over in #1047 is enabling this immediate drain behavior if and only if the corresponding NodeClaim had a terminationGracePeriod set (#834). This would ensure that we don't get in a state where a node can be draining indefinitely since the terminationGracePeriod sets an upper bound on node lifetime once the drain begins. Does something like this address your use case or is it important for PDBs to never be violated, even if this could result in partially drained nodes remaining on the cluster indefinitely?

hintofbasil · 2024-04-12T12:22:21Z

Thanks for the response Jason,

Making sure I understand your use case, you want drifted nodes to be immediately eligible to begin draining. The drain should occur without violating PDBs. It is acceptable for nodes to remain on the cluster partially drained for an indefinite period of time to ensure this. Does this sound about right?

Yeah, this is correct.

Does something like this address your use case or is it important for PDBs to never be violated, even if this could result in partially drained nodes remaining on the cluster indefinitely?

It is important for us that PDBs are never violated. We are happy with nodes left in a partially drained state indefinitely. We have alerting which will highlight this to us so that we can investigate why the PDB is blocking the full drain.

We could theoretically set the terminationGracePeriod to a very large duration to give us plenty of time to investigate manually but would prefer an option where a forced termination was never a possibility.

jonathan-innis · 2024-04-22T06:24:49Z

It is important for us that PDBs are never violated. We are happy with nodes left in a partially drained state indefinitely. We have alerting which will highlight this to us so that we can investigate why the PDB is blocking the full drain

I'm actually curious if #623 would help you as well here and if you even need a disruptionToleration? In reality, the disruption toleration duration is there to force removal of nodes by a given time, but it sound like you do still want some form of graceful termination, so you are really looking for us to stop scheduling pods (or at least weigh-away from scheduling to the nodes that are drifted). FWIW, if we go with the PreferNoSchedule taint here (which is what we are leaning toward), I'm not sure how much this will help if your entire NodePool gets drifted at once, since all nodes will be treated the same.

Karpenter also runs one disruption command at a time which can lead to the duration between drift reconciliations being many minutes

Which version are you running? With the changes introduced in v0.34.0, this should no longer be the case. We should be able to run at a much higher parallelism and you should see us react much quicker to drift events. Have you tried the disruption budget changes? Do you think that these changes might help mitigate your existing problem?

k8s-triage-robot · 2024-07-21T07:22:47Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-08-20T08:13:21Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-09-19T08:35:20Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-09-19T08:35:25Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hintofbasil added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 8, 2024

jonathan-innis removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 20, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow partial consolidation of nodes with blocking PDBs #1176

Allow partial consolidation of nodes with blocking PDBs #1176

hintofbasil commented Apr 8, 2024 •

edited by gjtempleton

Loading

jmdeal commented Apr 9, 2024

hintofbasil commented Apr 12, 2024

jonathan-innis commented Apr 22, 2024

k8s-triage-robot commented Jul 21, 2024

k8s-triage-robot commented Aug 20, 2024

k8s-triage-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

Allow partial consolidation of nodes with blocking PDBs #1176

Allow partial consolidation of nodes with blocking PDBs #1176

Comments

hintofbasil commented Apr 8, 2024 • edited by gjtempleton Loading

Description

jmdeal commented Apr 9, 2024

hintofbasil commented Apr 12, 2024

jonathan-innis commented Apr 22, 2024

k8s-triage-robot commented Jul 21, 2024

k8s-triage-robot commented Aug 20, 2024

k8s-triage-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

hintofbasil commented Apr 8, 2024 •

edited by gjtempleton

Loading