-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cilium): cilium-operator must be able to patch K8S nodes resources #15470
Conversation
As per the cilium recommended installation, the cilium-operator is in charge of removing the `node.cilium.io/agent-not-ready` that is put on nodes upon startup. As such the operator needs the PATCH permission on the node ressource.
Hi @maximumG. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
/retest-required |
/test pull-kops-e2e-aws-upgrade-126-ko126-to-klatest-kolatest-many-addons |
@maximumG To fix the failing tests you need to run |
/retest |
But that taint is not used in our setup. Does it then need this permission? |
I must agree that by default the cilium taint is not configured. However I might argue that users are able to customize the taint per kops instance groups, leaving the possibility to user to add this specific taint. Actually this is exactly what we tried out, 😃 we put this taint to ensure no workload got scheduled before the cilium-agent was ready. And we were surprised that it was not removed after all. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@olemarkus Do you need additional info to validate this PR on your side ? |
I also have the same issue. And I can understand what @olemarkus says.
What do you think? |
@johngmyers Any thoughts? |
Why is The referenced documentation describes a situation where there is a second CNI that Cilium has to wrest control from, but that shouldn't apply to kOps. If an extra taint was useful in some common configuration, I would prefer that kOps automatically apply it, in addition to giving cilium-operator the permssion to remove it. |
I believe this is needed long term anyway, because in v1.14 this change has been introduced:
|
When Cilium requires it we will need to add the permission. |
On our production K8S clusters, we had some weird scheduling on new joining nodes. Basically, business pods were sometimes scheduled before Cilium daemonset, exhausting the node ressources (CPU/memory). As a result, the cilium-agent was not able to get scheduled on the node and trigger another node scale up. With this taint added and removed by the cilium-operator, we would avoid such situation. |
If your business pods gets scheduled before the cilium DS, you must have specific tolerations for that. If you e.g tolerate any taint, adding more taints won't work. |
Taints are not under |
@johngmyers node/status is used to annotate the node. But I don't know if this is needed 100%. |
Looking at the latest cilium helm documentation for v1.12 here, I would say that this permission is indeed useless. The |
I think And removing So, maybe
|
Based on @h3poteto comment, I would still state that this permission it actually useless (and sorry for pushing this PR) Basically if we want to avoid scheduling pods on nodes where cilium is yet not ready, we just have to put the taint As already stated by some people here, the Does it make sense ? |
@maximumG: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fixes #15464.
As per the cilium recommended installation, the cilium-operator pod is in charge of removing the
node.cilium.io/agent-not-ready
that can be put on nodes upon startup. As such the operator needs thePATCH
permission on the node resource.