-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-127: graduate to Beta for 1.30 #4439
Conversation
giuseppe
commented
Jan 25, 2024
- Issue link: Support User Namespaces in pods #127
Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]> Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
So it is rendered properly on github. Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
91377aa
to
d7e437c
Compare
FTR - Prs with no assignee often fall thru cracks. :( |
- Get review from VM container runtimes maintainers (not blocker, as VM runtimes should just ignore | ||
the field, but nice to have) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fidencio @littlejawa have you or anyone from kata community taken a look?
- Detection: How can it be detected via metrics? Stated another way: | ||
how can an operator troubleshoot without logging into a master or worker node? | ||
|
||
In this case the Kubelet will fail to start with a clear error message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is through subuids binary don't we fallback to default range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fallback to default ranges only if there is no kubelet
user configured, or there is no getsubids
binary present.
If the kubelet
user is defined and we fail to read its configuration, an error should be reported.
If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >= | ||
1.25). If it is scheduled to a node that the kubelet has the feature flag activated and the node | ||
meets the requirements to use user namespaces, then the pod will be created with the namespace. If | ||
it is scheduled to a node that has the feature disabled, it will be scheduled without the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduled or created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should we allow any way for users to for "more" IDs mapped? If yes, how many more and how? | ||
- Should we allow the user to ask for specific mappings? | ||
- Get review from VM container runtimes maintainers | ||
- Gather and address feedback from the community |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we resolve the remaining unresolved part in this KEP (being support for Windows and runtimes that don't support linux namespaces)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows support was resolved several months ago. Windows maintainers commented that they don't want this feature in windows and it looked fine anyways from their point of view.
We will try to gather feedback from VM runtimes, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to update with the links later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"don't want this feature in windows"
Love it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That SGTM - just please update the KEP to reflect it :)
@@ -678,7 +686,7 @@ well as the [existing list] of feature gates. | |||
--> | |||
|
|||
- [x] Feature gate (also fill in values in `kep.yaml`) | |||
- Feature gate name: UserNamespacesStatelessPodsSupport | |||
- Feature gate name: UserNamespacesPodsSupport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on unchanged lines, so commenting here:
-
Please fill-in Upgrade/Downgrade strategy
-
For version skew, can you please also refer to the case where kube-apiserver understands the new field (spec.hostUsers) and kubelet doesn't [and vice-versa]
-
"Old runtime and new kubelet" - the fact that we create the pod in host namespace is somewhat counter-intuitive - as an operator I would expect that if I turned on the FG it works and is not silently dropped. Can we at least emit some error / event / metric / sth - in that case?
-
"Are there any tests for feature enablement/disablement?"
Were those added? If not can you please prioritize and add sth like that explicitly as beta criteria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gate seems to actually be named "UserNamespacesSupport"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t for 1 and 2: https://github.com/kubernetes/enhancements/pull/4439/files#diff-a9ca9fbce1538447b92f03125ca2b8474e2d875071f1172d2afa0b1e8cadeabaL613-L633 What are you looking for from this section that isn't covered already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update/Downgrade is not filled in at all
- For version skew I explicit wrote - please describe the case what happens if feature is enabled in kube-apiserver and not in kubelet and vice-versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Oh, that is one of the few section that doesn't have any comments, it is such a small text that I didn't see it! We will add it now, sorry!
@@ -753,6 +761,18 @@ This section must be completed when targeting beta to a release. | |||
|
|||
###### How can a rollout or rollback fail? Can it impact already running workloads? | |||
|
|||
The rollout is just a feature flag on the kubelet and the kube-apiserver. | |||
|
|||
If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this sentence - it doesn't matter how many apiservers were upgraded, rather if the one I'm talking to now was, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry. This should be something like:
If one APIserver is upgraded while other's aren't and you are talking to a not upgraded APIServer...
namespace. | ||
|
||
On a rollback, pods created while the feature was active (created with user namespaces) will have to | ||
be restarted to be re-created without user namespaces. Just a re-creation of the pod will do the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly misleading to me - what does "restarted" mean here?
If we recommend recreation, let's make that explicit and remove the reference to "restarting"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
On Kubernetes side, the kubelet should start correctly. | ||
|
||
On the node runtime side, a pod created with pod.spec.HostUsers=false should be on RUNNING state if | ||
all node requirements are met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this answer - if I'm managing 1000s of clusters, it's not something I can reasonably monitor. Can we rely on easier to monitor conditions - ideally some metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add metrics to signal on the errors listed here in the KEP (in the section "What are other known failure modes?").
This can potentially cover a lot at the kubelet layer. But things like one of the the filesystems used by the pod doesn't support idmap mounts on the kernel running will not be possible to detect from the kubelet. The runtime will return a clear error on that case, that is shown to the user, but I don't think we can have a metric for that.
Does doing that sounds like a good plan for you?
I'm not convinced about this, though. All errors are returned to the user, so it's very visible if it fails. And having a proliferation of metrics can also harm, especially long-term. Do you think this is definitely a "yes, this metrics are useful" or do you have mixed feelings about it too?
Also, do you think this is needed for beta graduation or can be added while the feature is in beta and still disabled by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - let me clarify my concern as I think I wasn't clear enough.
For anything that returns a clear error to usage, we're good - I don't think we need anything else.
What I'm worried about is the fact of silently not fulfilling user intents.
If (as a I user) I'm requesting to run in user namespace (by setting the new podspec field) and my pod goes to RUNNING state, I expect it to be running in a user namespace. If it's running in a host namespace, it's a big problem and at the very least I would like to see that exposed in an easy to digest for the operator way (metric).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so forget those metrics, then :-)
If we do what we suggested here, that basically the pod with userns only will be created if the CRI runtime knows about userns (an error will be returned otherwise), will that do it?
@@ -803,6 +839,11 @@ logs or events for this purpose. | |||
|
|||
###### How can someone using this feature know that it is working for their instance? | |||
|
|||
Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node | |||
that meets all the requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How as an operator I can check if the node meets all the requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the link to the k8s doc if that helps: https://kubernetes.io/docs/concepts/workloads/pods/user-namespaces/#before-you-begin
Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node | ||
that meets all the requirements. | ||
|
||
There are step-by-step examples in the Kubernetes documentation too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a link to this: https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/
I didn't do it because links here can easily become stale over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
levels that could help debug the issue? | ||
Not required until feature graduated to beta. | ||
|
||
The error is returned on pod-creation, no need to search for logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather on an attempt to start the pod?
IIUC, Kubelet will get an error from the runtime and will propagate it to the pod, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify in the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. Added it now! Here and in the rest of the errors, that this is visible on the pod events.
- Testing: Are there any tests for failure mode? If not, describe why. | ||
|
||
It is part of the system configuration. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great answer playbook entry - the best I've seen so far - thanks!
@@ -803,6 +839,11 @@ logs or events for this purpose. | |||
|
|||
###### How can someone using this feature know that it is working for their instance? | |||
|
|||
Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node | |
Check if any pod has the `.spec.ghstUsers` field set to false and is on RUNNING state on a node |
- Condition name: | ||
- Other field: | ||
- [x] Other (treat as last resort) | ||
- Details: check pods with pod.spec.HostUsers field set to false, and see if they are in RUNNING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Details: check pods with pod.spec.HostUsers field set to false, and see if they are in RUNNING | |
- Details: check pods with `.spec.hostUsers` field set to false, and see if they are in RUNNING |
- Detection: How can it be detected via metrics? Stated another way: | ||
how can an operator troubleshoot without logging into a master or worker node? | ||
|
||
In this case the Kubelet will fail to start with a clear error message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
In this case the Kubelet will fail to start with a clear error message. | |
In this case the kubelet will fail to start with a clear error message. |
pod had to be unique for every pod in the cluster, easily reaching a limit when the cluster is "big | ||
enough" and the UID space runs out. However, with idmap mounts the IDs assigned to a pod just needs | ||
to be unique within the node (and with 64k ranges we have 64k pods possible in the node, so not | ||
really an issue). IOW, by using idmap mounts, we changed the IDs limit to be node-scoped instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
really an issue). IOW, by using idmap mounts, we changed the IDs limit to be node-scoped instead of | |
really an issue). In other words: by using idmap mounts, we changed the IDs limit to be node-scoped instead of |
There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what | ||
is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be | ||
considered in a future KEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no known use cases?
Two examples come to mind:
- running a container tool inside a Pod, where that container tool wants to use a UID range
- running an application inside a Pod where the application uses UIDs above 65535 by default; I've worked with some of these, usually avoiding the 0-65535 range to avoid the risk of conflict with a local system user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The first example doesn't necessarily need bigger mappings.
- We didn't hear of any such application that wasn't possible to configure it. The point stands: that is out of scope for this KEP, it can be worked on in a future KEP.
|
||
There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what | ||
is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be | ||
considered in a future KEP. | ||
|
||
### Allow runtimes to pick the mapping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
### Allow runtimes to pick the mapping? | |
### Allow runtimes to pick the mapping |
Alternatives aren't questions, they're things we have ruled out and can provide a reason for that exclusion.
@@ -1149,15 +1358,23 @@ KEPs can explore this path if they so want to. | |||
|
|||
### 64k mappings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should frame this as an alternative that we have ruled out.
@wojtek-t thanks for the detailed review! I'm on my train to FOSDEM now, so I'm not sure I'll be able to answer all now |
Signed-off-by: Rodrigo Campos <[email protected]> Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
@wojtek-t PTAL, updated the KEP |
will be accepted (if the apiserver is >= 1.25). If it is scheduled to a node that the kubelet has | ||
the feature flag activated and the node meets the requirements to use user namespaces, then the | ||
pod will be created with the namespace. If it is scheduled to a node that has the feature disabled, | ||
it will be created without the user namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it's no longer true after our discussion - the pod should fail to start in the second case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t No, this is still true. If the feature is disabled in the kubelet, the kubelet completely ignores that field. The pod will only fail if the kubelet has the feature activated and the container runtime doesn't support it.
I don't think the kubelet can act on a field that should only act if the feature is enabled. We use this a lot on the error cases mentioned in detail in some other section, where you can just disable the feature gate to stop any bleeding that there might be related to this.
@@ -753,6 +815,17 @@ This section must be completed when targeting beta to a release. | |||
|
|||
###### How can a rollout or rollback fail? Can it impact already running workloads? | |||
|
|||
The rollout is just a feature flag on the kubelet and the kube-apiserver. | |||
|
|||
If one APIserver is upgraded while other's aren't and you are talking to a not upgraded the pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this first sentence - I don't think it matters really and it doesn't bring value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Sorry, I thought it would help as context, as there are lot of keps that added a field already and we all rely on existing plumbing to handle this correctly. But removed :)
it will be created without the user namespace. | ||
|
||
On a rollback, pods created while the feature was active (created with user namespaces) will have to | ||
be re-created without user namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... If those weren't recreated, they will continue to run in user namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
|
||
On the node runtime side, a pod created with pod.spec.hostUsers=false should be on RUNNING state if | ||
all node requirements are met. If the CRI runtime or the handler do not support the feature, the kubelet | ||
returns an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This puts the pod into Failed state, right?
So the number of Failed pods is effectively what we can look at, right?
[Not a native metric, but something measurable]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the following to clarify.
When a pod hits this error returned by the kubelet, the status in kubectl
is shown as
ContainerCreating
and the pod events shows:
Warning FailedCreatePodSandBox 12s (x23 over 5m6s) kubelet Failed to create pod sandbox: user namespaces is not supported by the runtime
The following kubelet metrics are useful to check:
kubelet_running_pods
: Shows the actual number of pods runningkubelet_desired_pods
: The number of pods the kubelet is trying to run
If these metrics are very different, it means there are desired pods that can't be set to running.
If that is the case, checking the pod events to see if they are failing for user namespaces reasons
(like the errors shown in this KEP) is advised, in which case it is recommended to rollback or
disable the feature gate.
@@ -803,6 +899,10 @@ logs or events for this purpose. | |||
|
|||
###### How can someone using this feature know that it is working for their instance? | |||
|
|||
If the runtime doesn't support user namespaces an error is returned by the kubelet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the pod will effectively transition to Failed state, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a FailedCreatePodSandBox event. kubectl is shown as
ContainerCreating and the pod events shows:
Warning FailedCreatePodSandBox 12s (x23 over 5m6s) kubelet Failed to create pod sandbox: user namespaces is not supported by the runtime
levels that could help debug the issue? | ||
Not required until feature graduated to beta. | ||
|
||
The error is returned on pod-creation, no need to search for logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify in the text?
@@ -579,14 +628,11 @@ use container runtime versions that have the needed changes. | |||
|
|||
##### Beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think critest should be updated during Beta:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
@wojtek-t Fixed the latest review commets. PTAL! I've added it as a separate commit, so you can easily see the diff |
Co-authored-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
pod.spec.hostUsers and has the feature gate enabled. | ||
|
||
If the kube-apiserver doesn't support the feature at all (< 1.25), a pod with userns will be | ||
rejected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true - kube-apiserver is not rejecting the pod, but rather is silently dropping this field.
Also - this isn't only about the release <1.25, but rather also when the feature gate is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is rejecting the pod. This is what happens when a field is known, for example:
$ kubectl apply -f mypod2.yaml
Error from server (BadRequest): error when creating "mypod2.yaml": Pod in version "v1" cannot be handled as a Pod: strict decoding error: unknown field "spec.nonExistingField"
I added the pod.spec.nonExistingField: true
to the pod.
Am I missing something? Or are we talking about different things?
Regarding the feature gate disabled, that is mentioned a few lines above: 3766329#diff-a9ca9fbce1538447b92f03125ca2b8474e2d875071f1172d2afa0b1e8cadeabaR684-R686
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strange - the whole point of having the dropDisabledFields pattern is to handle and drop the field if the FG is disabled.
@liggitt - can you please shed some light on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this is the apiserver as started with the local-up-cluster.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- kubectl opts into strict mode, where unknown fields are flagged / rejected.
kubectl apply ... --validate=false
shows what normal API clients (like controllers) would experience, where unknown fields are just dropped. - emulating the behavior where unknown fields are dropped on create when a feature is disabled is the standard approach taken.
- rarely, for a security-related feature, we will reject attempts to use the new security-related field if the feature is disabled rather than dropping the field. The only one I can remember off the top of my head was the windows
hostProcess
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt does that mean that, then, this is handled as expected by the apiserver? We aren't doing anything special in the apiserver for this field (pod.spec.hostUsers) on the apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we even need this sentence? 1.25 has been out of support for a while, and we'd also not support a 1.30->1.25 skew between kubelet and KAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't doing anything special in the apiserver for this field (pod.spec.hostUsers) on the apiserver.
Once you add the field, you need to start doing things with it in the apiserver when the feature is disabled (either drop the field data or reject the request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to specify the kube-apiserver will drop the field if it's unknown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the kube-apiserver doesn't support the feature at all (< 1.25), a pod with userns will be | ||
rejected. | ||
|
||
If the kubelet doesn't support the feature (< 1.25), it will ignore the pod.spec.hostUsers field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Kubelet is on old version not knowing the field - this is fine.
I'm more interested what happens if Kubelet is new enough and understands this field (say even 1.30), but it has feature gate disabled?
Will it ignore the field completely? [this isn't expected]
Or rather, will it reject the pod because it can't fulfill it? - which is what I would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reject the pod in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current proposal is that it will ignore it completely. So feature disabled and an old kubelet that doesn't even know about this field, behave the same way.
We can change it to reject the pod in that case too (it is trivial). I proposed it this way, so by disabling the feature gate you can really stop any bleeding related to user namespaces. It seems more risky in case there is some bug, IMHO.
We can change it to that if you prefer, of course. As I said, it is trivial to do that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we should reject (as I wrote above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, changed the text to do that instead. Thanks! :)
If it is scheduled to a node where the kubelet has the feature flag activated | ||
and the node meets the requirements to use user namespaces, then the pod will be | ||
created with the namespace. If it is scheduled to a node that has the feature | ||
disabled, it will be created without the user namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for version skew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to that!
Signed-off-by: Rodrigo Campos <[email protected]> Signed-off-by: Peter Hunt <[email protected]>
f688ef1
to
b64d85b
Compare
OK - with the recent changes, I'm fine with it from PRR pov. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, mrunalp, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |