libct: Signal: honor RootlessCgroups #4395

AkihiroSuda · 2024-09-04T20:08:51Z

signalAllProcesses() depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.

When RootlessCgroups is set to true, runc just ignores the error from signalAllProcesses and may leak some processes running. (See the comments in this PR)
In the future, runc should walk the process tree to avoid such a leak.

Note that RootlessCgroups is a misnomer; it is set to false despite the name when cgroup v2 delegation is configured.
This is expected to be renamed in a separate commit.

Fix #4394

AkihiroSuda · 2024-09-04T20:11:27Z

libcontainer/container_linux.go

@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error {
 // leftover processes. Handle this special case here.
 if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
 if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
+ if c.config.RootlessCgroups { // may not have an access to cgroup


"RootlessCgroups" is misnomer, as this is set to false on modern rootless environments (cgroup v2 + systemd).

Should be renamed to something else in a separate PR.

cyphar · 2024-09-04T20:32:31Z

libcontainer/container_linux.go

@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error {
 // leftover processes. Handle this special case here.
 if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
 if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
+ if c.config.RootlessCgroups { // may not have an access to cgroup
+ return c.signal(s)


signalAllProcesses has a cgroupv1-friendly fallback, what is missing from there? Maybe the real issue is that we do m.Exists() in signalAllProcesses (I guess that's where the error is coming from?)?

@kolyshkin wdyt?

signalAllProcesses has a cgroupv1-friendly fallback

No when lacking an access to the cgroup.
It just returns ErrNotRunning immediately :

runc/libcontainer/init_linux.go

Lines 696 to 701 in 961b803

// signalAllProcesses freezes then iterates over all the processes inside the

// manager's cgroups sending the signal s to them.

func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {

if !m.Exists() {

return ErrNotRunning

}

And it seems correct that m.Exists() returns false from the perspective of m, as the cgroup actually does not exist in this case

Right, but the function name doesn't imply that it's cgroup-specific (though it kind of is). Idk...

signalAllProcesses is only called when shared pid ns is used (as otherwise it's enough to kill pid 1), and to know all the PIDs we need a dedicated cgroup. Meaning, that rootless container with shared pidns won't work, as there is no way to find all PIDs (well, theoretically, we can find all the children of container init in this case, but I am not sure it's a good idea).

Not sure how it works in runc 1.1 -- most probably it just doesn't (i.e. we leak processes).

lifubang · 2024-09-05T02:14:43Z

libcontainer/container_linux.go

@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error {
 // leftover processes. Handle this special case here.
 if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
 if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
+ if c.config.RootlessCgroups { // may not have an access to cgroup
+ return c.signal(s)


Another question, for a container with shared pid namespace, just only kill the init process can cause all other processes in this container killed? If not, these non-init processes will be leaked after we delete the container. Maybe these non-init processes are managed by downstream tools(containerd or rootlesskit)?

The only container init process is killed, and other processes are leaked until the pidns init process is killed.

The leaked processes are not managed by anything, so it is still highly recommended to use cgroup v2 systemd delegation.

lifubang · 2024-09-05T02:46:50Z

tests/integration/kill.bats

+
+ runc run -d --console-socket "$CONSOLE_SOCKET" attached_ctr
+ [ "$status" -eq 0 ]
+ testcontainer attached_ctr running


Could you please exec another process to this container? For example:

runc exec -d attached_ctr sleep infinity

But I think it's hard to simulate the situation that runc can't access the cgroup path.

And should detect whether the second process has been killed or not after we kill the container.

The second process is just leaked when cgroup is not delegated, as in v1.1.

The scope of the PR is limited to just fix the regression #4394.

It is still quite hard to handle the leaked processes in a robust way, and it should be discussed in a separate issue.
(Probably we should walk the procfs tree to track descendants of the container init process)

The container init might be already gone (see #4102).

lifubang · 2024-09-05T03:14:29Z

signalAllProcesses() depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.

Maybe runc kill -a container_id KILL can't kill this type container either in version v1.1.*?

AkihiroSuda · 2024-09-05T06:15:06Z

signalAllProcesses() depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.

Maybe runc kill -a container_id KILL can't kill this type container either in version v1.1.*?

Seems so.

lifubang · 2024-09-06T11:00:16Z

libcontainer/state_linux.go

@@ -44,6 +44,7 @@ func destroy(c *Container) error {
 // and destroy is supposed to remove all the container resources, we need
 // to kill those processes here.
 if !c.config.Namespaces.IsPrivate(configs.NEWPID) {
+ // Likely to fail when c.config.RootlessCgroups is true
 _ = signalAllProcesses(c.cgroupManager, unix.SIGKILL)
 }
 if err := c.cgroupManager.Destroy(); err != nil {


Another question, if runc has no access to the cgroup path, this cgroup destroy call will throw the error either.

Destroy seems to ignore ENOENT

kolyshkin

HostPID Pod Container Cgroup path was residual after container restarts #4040
regression: can't kill and delete the container with shared(host) pid ns when the init process has dead #4047
Fix runc kill and runc delete for containers with no init and no private PID namespace #4102

I think this is adequate given the circumstances, but the commit message and the code needs to be amended to explain that in this case we leak processes (maybe add a warning, too).

I will open a separate PR to warn about such configuration.

`signalAllProcesses()` depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup. When `RootlessCgroups` is set to `true`, runc just ignores the error from `signalAllProcesses` and may leak some processes running. (See the comments in PR 4395) In the future, runc should walk the process tree to avoid such a leak. Note that `RootlessCgroups` is a misnomer; it is set to `false` despite the name when cgroup v2 delegation is configured. This is expected to be renamed in a separate commit. Fix issue 4394 Signed-off-by: Akihiro Suda <[email protected]>

AkihiroSuda · 2024-09-10T18:59:13Z

I think this is adequate given the circumstances, but the commit message and the code needs to be amended to explain that in this case we leak processes (maybe add a warning, too).

Done. Added a warning too.

rata

LGTM, thanks!

@kolyshkin thanks for all the related issues/PRs, it really helps to review this :)

kolyshkin

LGTM

lifubang · 2024-09-12T15:06:46Z

libcontainer/container_linux.go

@@ -388,11 +388,21 @@ func (c *Container) Signal(s os.Signal) error {
 // leftover processes. Handle this special case here.
 if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
 if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
+ if c.config.RootlessCgroups { // may not have an access to cgroup
+ logrus.WithError(err).Warn("failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation)")


Shuold we remove .WithError(err) here? Because for kill a such type running container, it will log with an error: `failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation) error=container not running". It will make the user confused.

(I'm sorry to say it after merged, because I'm in a busy state these days.)

For other types of errors it still makes sense to print the err?

AkihiroSuda added area/rootless area/cgroupv1 labels Sep 4, 2024

AkihiroSuda commented Sep 4, 2024

View reviewed changes

AkihiroSuda force-pushed the fix-4394 branch 2 times, most recently from 0cd13f7 to f35ae2c Compare September 4, 2024 20:24

AkihiroSuda mentioned this pull request Sep 4, 2024

[v1.2 regression] [cgroup v1 + rootless] nerdctl run -d --name=bar --pid=container:foo ; nerdctl rm -f bar hangs #4394

Closed

AkihiroSuda force-pushed the fix-4394 branch from f35ae2c to ca303ca Compare September 4, 2024 20:27

cyphar reviewed Sep 4, 2024

View reviewed changes

lifubang reviewed Sep 5, 2024

View reviewed changes

lifubang reviewed Sep 6, 2024

View reviewed changes

kolyshkin requested changes Sep 10, 2024

View reviewed changes

AkihiroSuda force-pushed the fix-4394 branch from ca303ca to 429e06a Compare September 10, 2024 18:58

rata approved these changes Sep 11, 2024

View reviewed changes

kolyshkin approved these changes Sep 11, 2024

View reviewed changes

kolyshkin merged commit f9f5764 into opencontainers:main Sep 11, 2024
42 checks passed

kolyshkin mentioned this pull request Sep 12, 2024

runc create/run: warn on rootless + shared pidns + no cgroup #4398

Merged

lifubang reviewed Sep 12, 2024

View reviewed changes

lifubang mentioned this pull request Sep 23, 2024

Add ErrCgroupNotExist #4410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libct: Signal: honor RootlessCgroups #4395

libct: Signal: honor RootlessCgroups #4395

AkihiroSuda commented Sep 4, 2024 •

edited

Loading

AkihiroSuda Sep 4, 2024

cyphar Sep 4, 2024

AkihiroSuda Sep 4, 2024

AkihiroSuda Sep 4, 2024

cyphar Sep 4, 2024 •

edited

Loading

kolyshkin Sep 6, 2024

lifubang Sep 5, 2024

AkihiroSuda Sep 5, 2024

lifubang Sep 5, 2024

lifubang Sep 5, 2024

AkihiroSuda Sep 5, 2024

kolyshkin Sep 10, 2024

lifubang commented Sep 5, 2024

AkihiroSuda commented Sep 5, 2024

lifubang Sep 6, 2024

AkihiroSuda Sep 6, 2024

kolyshkin left a comment

AkihiroSuda commented Sep 10, 2024

rata left a comment

kolyshkin left a comment

lifubang Sep 12, 2024 •

edited

Loading

AkihiroSuda Sep 12, 2024

lifubang Sep 23, 2024

	// signalAllProcesses freezes then iterates over all the processes inside the
	// manager's cgroups sending the signal s to them.
	func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {
	if !m.Exists() {
	return ErrNotRunning
	}

libct: Signal: honor RootlessCgroups #4395

libct: Signal: honor RootlessCgroups #4395

Conversation

AkihiroSuda commented Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyphar Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lifubang commented Sep 5, 2024

AkihiroSuda commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin left a comment

Choose a reason for hiding this comment

AkihiroSuda commented Sep 10, 2024

rata left a comment

Choose a reason for hiding this comment

kolyshkin left a comment

Choose a reason for hiding this comment

lifubang Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented Sep 4, 2024 •

edited

Loading

cyphar Sep 4, 2024 •

edited

Loading

lifubang Sep 12, 2024 •

edited

Loading