Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes 1.26.1 - Linux Capabilities - starting container process caused: apply caps: operation not permitted #330

Open
MysticalMount opened this issue Apr 9, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@MysticalMount
Copy link

Describe the bug

Ive deployed the workers to a privileged namespace:

Namespace: cc

apiVersion: v1
kind: Namespace
metadata:
  name: cc
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: v1.26
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/audit-version: v1.26
    pod-security.kubernetes.io/warn: privileged
    pod-security.kubernetes.io/warn-version: v1.26

On Kubernetes 1.26.1

When trying to run a hello world pipeline I get this using Guardian inside the worker pod:

{"timestamp":"2023-04-09T16:51:45.884909106Z","level":"error","source":"guardian","message":"guardian.api.garden-server.create.failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: apply caps: operation not permitted","request":{"Handle":"54e0c267-01e1-4e01-690f-df2cff3b5bf8","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/68c0ccae-e204-453a-6365-4e8b36d6e541/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/63e92077-b1e9-428d-5172-fca9332f4ac1/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.4548"}

Im fairly new to Concourse, so if Im missing something, sorry!

I can see that securityContext: privileged: true is set on the workers statefulset - in the source YAML and its also seemingly set in the resulting statefulset:

        securityContext:
          capabilities:
            add:
            - all
          privileged: true

(Ive been adding the capabilities to try to resolve the issue)

As far as I can tell the container is privileged - I am also using TalosCtl, but cant find anything, thus far to suggest it it Talos related.

Any steps/help/advice on where to go next or what Ive missed welcome.

Reproduction steps

  1. Deploy Kubernetes v1.26.1
  2. Deploy Helm Chart with mostly default settings with Web and Worker
  3. Connect to web, deploy example pipeline using fly
    ...

Expected behavior

Expected would be the container image to pull and start successfully

Additional context

In my setup Im using custom registries so expect some setup here, but suspect we are hitting this issue pre to that being the problem

@MysticalMount MysticalMount added the bug Something isn't working label Apr 9, 2023
@flokli
Copy link

flokli commented Jul 13, 2023

I digged a bit through the error messages, ended up dropping CAP_SYS_MODULE from concourse worker/runtime/spec/capabilities.go, but then I get a slightly different error message from runc:

runc run: exit status 1: runc run failed: unable to start container process: unable to apply caps: operation not permitted

This was essentially that patch:

commit af3cebb55c01a298b69243517e72b268665b9e2b
Author: Florian Klink <[email protected]>
Date:   Thu Jul 13 14:27:28 2023 +0300

    worker: drop CAP_SYS_MODULE from the list of capabilities
    
    `worker/runtime/spec/spec.go@defaultGardenOCISpec` calls out to
    `OciCapabilities(privileged bool)`, returning a list of capabilities to
    put in the OCI spec, which is then passed to runc.
    
    Note this is independent of what the container payload might actually
    need, it always asks for these capabilities.
    
    This causes problems when running concourse-worker in a Talos cluster,
    which does not allow asking for CAP_SYS_MODULE and CAP_SYS_BOOT
    (Concourse doesn't  ask for the latter):
    
    ```
    concourse-worker-2 concourse-worker {"timestamp":"2023-07-13T11:17:47.529591744Z","level":"error","source":"guardian","message":"guardian.api.garden-server.create.failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: apply caps: operation not permitted","request":{"Handle":"af712415-e9aa-4ba7-639f-b291f6e2caaf","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/e5bce4ac-4d45-45c5-6338-38aaaaf27e72/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/1b925d7a-e33b-41dc-6f4f-9cdc701583f0/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.140807"}}
    ```
    
    See https://www.talos.dev/v1.4/learn-more/process-capabilities/ for
    details.
    
    Removing that CAP from the list should get runc to successfully execute
    in Talos clusters. It might cause problems for people trying to modprobe
    kernel modules inside Concourse, but I hope noone does that ;-)
    
    Signed-off-by: Florian Klink <[email protected]>

diff --git a/worker/runtime/spec/capabilities.go b/worker/runtime/spec/capabilities.go
index b38c32f4a..6443c1fb0 100644
--- a/worker/runtime/spec/capabilities.go
+++ b/worker/runtime/spec/capabilities.go
@@ -72,7 +72,6 @@ var (
 		"CAP_SYS_ADMIN",
 		"CAP_SYS_BOOT",
 		"CAP_SYS_CHROOT",
-		"CAP_SYS_MODULE",
 		"CAP_SYS_NICE",
 		"CAP_SYS_PACCT",
 		"CAP_SYS_PTRACE",

A version of this was pushed to flokli/concourse:20230713-01.

I went ahead and patched OciCapabilities to always return UnprivilegedContainerCapabilities, just to see how far it'd get:

commit 043babd9347f4e671e5e03f22b1a3d9065fac5bb
Author: Florian Klink <[email protected]>
Date:   Thu Jul 13 15:37:16 2023 +0300

    HACK

diff --git a/worker/runtime/spec/capabilities.go b/worker/runtime/spec/capabilities.go
index 6443c1fb0..2600f0001 100644
--- a/worker/runtime/spec/capabilities.go
+++ b/worker/runtime/spec/capabilities.go
@@ -3,11 +3,7 @@ package spec
 import "github.com/opencontainers/runtime-spec/specs-go"
 
 func OciCapabilities(privileged bool) specs.LinuxCapabilities {
-	if !privileged {
-		return UnprivilegedContainerCapabilities
-	}
-
-	return PrivilegedContainerCapabilities
+	return UnprivilegedContainerCapabilities
 }
 
 var (

A version of this was pushed to flokli/concourse:20230713-02.

With that, runc fails with runc run failed: unable to start container process: can't get final child's PID from pipe: EOF

It looks like the Concourse model of running runc inside privileged pods gets more and more incompatible with more recent/secure versions of Kubernetes.

I'm not sure how much further time I'm willing to spend on trying to get this working - concourse/concourse#5682 sounds like a more sustainable long-term solution.

@flokli
Copy link

flokli commented Jul 13, 2023

Hmmh, concourse adds both CAP_SYS_BOOT and CAP_SYS_MODULE, I just got tricked by the Talos documentation calling it wrong (fixed in siderolabs/talos#7473). I'll re-roll the first patch and see what dropping both capabilities will do:

commit 92d624adbb1c7d4e855602703f6a81387a8868d8 (HEAD)
Author: Florian Klink <[email protected]>
Date:   Thu Jul 13 14:27:28 2023 +0300

    worker: drop CAP_SYS_{BOOT,MODULE} from the list of capabilities
    
    `worker/runtime/spec/spec.go@defaultGardenOCISpec` calls out to
    `OciCapabilities(privileged bool)`, returning a list of capabilities to
    put in the OCI spec, which is then passed to runc.
    
    Note this is independent of what the container payload might actually
    need, it always asks for these capabilities.
    
    This causes problems when running concourse-worker in a Talos cluster,
    which does not allow asking for CAP_SYS_MODULE and CAP_SYS_BOOT
    (Concourse doesn't  ask for the latter):
    
    ```
    concourse-worker-2 concourse-worker {"timestamp":"2023-07-13T11:17:47.529591744Z","level":"error","source":"guardian","message":"guardian.api.garden-server.create.failed","data":{"error":"runc run: exit status 1: container_linux.go:380: starting container process caused: apply caps: operation not permitted","request":{"Handle":"af712415-e9aa-4ba7-639f-b291f6e2caaf","GraceTime":0,"RootFSPath":"raw:///concourse-work-dir/volumes/live/e5bce4ac-4d45-45c5-6338-38aaaaf27e72/volume","BindMounts":[{"src_path":"/concourse-work-dir/volumes/live/1b925d7a-e33b-41dc-6f4f-9cdc701583f0/volume","dst_path":"/scratch","mode":1}],"Network":"","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{},"pid_limits":{}}},"session":"3.1.140807"}}
    ```
    
    See https://www.talos.dev/v1.4/learn-more/process-capabilities/ for
    details.
    
    Removing these capabilities from the list should get runc to
    successfully execute in Talos clusters. It might cause problems for
    people trying to modprobe kernel modules inside Concourse, but I hope
    noone does that ;-)
    
    Signed-off-by: Florian Klink <[email protected]>

diff --git a/worker/runtime/spec/capabilities.go b/worker/runtime/spec/capabilities.go
index b38c32f4a..9819650a4 100644
--- a/worker/runtime/spec/capabilities.go
+++ b/worker/runtime/spec/capabilities.go
@@ -70,9 +70,7 @@ var (
                "CAP_SETUID",
                "CAP_SYSLOG",
                "CAP_SYS_ADMIN",
-               "CAP_SYS_BOOT",
                "CAP_SYS_CHROOT",
-               "CAP_SYS_MODULE",
                "CAP_SYS_NICE",
                "CAP_SYS_PACCT",
                "CAP_SYS_PTRACE",

@flokli
Copy link

flokli commented Jul 13, 2023

Ok, with the new patch applied (pushed to flokli/concourse:20230713-03), removing both of these two caps from the list, and adding all capabilities in the pod spec, I get the same runc run failed: unable to start container process: can't get final child's PID from pipe: EOF.

That smells like an incompatibility, either with the cgroup structure in Talos, or assuming it's using Docker as an outer container runtime.

@flokli
Copy link

flokli commented Jul 13, 2023

moby/moby#40835 (comment) suggests this might be an issue with what mountpoints are seen inside the container, or with user namespace support, even though I'm a bit unsure where runc itself is emitting that error message…

@flokli
Copy link

flokli commented Jul 17, 2023

I sent a PR containing the first patch to concourse/concourse#8791.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants