-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow partial consolidation of nodes with blocking PDBs #1176
Comments
I think this ties into some of the discussion that's been taking place in #1047. I definitely see the use case here and think this falls under a broader story we've been discussing: ensuring cluster operators have mechanisms to disrupt nodes regardless of pod level configuration (PDBs, the Making sure I understand your use case, you want drifted nodes to be immediately eligible to begin draining. The drain should occur without violating PDBs. It is acceptable for nodes to remain on the cluster partially drained for an indefinite period of time to ensure this. Does this sound about right? I want to draw particular attention to that last point, it being acceptable for nodes to remain partially drained indefinitely. This has been something we've been hesitant to enable, hence the current behavior where we wait until we know the node can be disrupted successfully. One option I posed over in #1047 is enabling this immediate drain behavior if and only if the corresponding |
Thanks for the response Jason,
Yeah, this is correct.
It is important for us that PDBs are never violated. We are happy with nodes left in a partially drained state indefinitely. We have alerting which will highlight this to us so that we can investigate why the PDB is blocking the full drain. We could theoretically set the |
I'm actually curious if #623 would help you as well here and if you even need a disruptionToleration? In reality, the disruption toleration duration is there to force removal of nodes by a given time, but it sound like you do still want some form of graceful termination, so you are really looking for us to stop scheduling pods (or at least weigh-away from scheduling to the nodes that are drifted). FWIW, if we go with the
Which version are you running? With the changes introduced in v0.34.0, this should no longer be the case. We should be able to run at a much higher parallelism and you should see us react much quicker to drift events. Have you tried the disruption budget changes? Do you think that these changes might help mitigate your existing problem? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description
What problem are you trying to solve?
We run clusters with relatively large nodes (~100 pods), significant autoscaling of deployments, and some slow pod start up/termination times.
After we update a Nodepool we monitor the status of the nodes and wait for all drifted nodes to be replaced as part of our continuous delivery (CD) pipeline. This makes the speed Karpenter removes drifted nodes from our clusters significant as it can block our CD pipelines for long periods of time.
Currently Karpenter will only consolidate a drifted node if all the pods are evictable (no PDBs are blocking) (source).
Karpenter also runs one disruption command at a time which can lead to the duration between drift reconciliations being many minutes.
This often results in a situation where Karpenter checks too infrequently and the odds of a single PDB blocking consolidation are so high that it takes many hours for large nodes to be valid consolidation candidates. It would be preferable for us if we could drain all the pods which are currently evictable and then wait for PDBs to allow the eviction of the remaining pods.
This will require changes to the API to configure this. The sensible place would be adding it to the
disruption
section of the NodePool spec keeping it in line with the Disruption Controls design.How important is this feature to you?
We currently have a hacky work-a-round where we
kubectl drain
drifted nodes askubectl drain
does evicts pods as they become evictable. We would prefer a proper implementation in the controller.I am interested on working on this if there is agreement the feature is useful and we can agree on the API.
The text was updated successfully, but these errors were encountered: