Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct: Signal: honor RootlessCgroups #4395

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Sep 4, 2024

signalAllProcesses() depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.

When RootlessCgroups is set to true, runc just ignores the error from signalAllProcesses and may leak some processes running. (See the comments in this PR)
In the future, runc should walk the process tree to avoid such a leak.

Note that RootlessCgroups is a misnomer; it is set to false despite the name when cgroup v2 delegation is configured.
This is expected to be renamed in a separate commit.

Fix #4394

@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error {
// leftover processes. Handle this special case here.
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
if c.config.RootlessCgroups { // may not have an access to cgroup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"RootlessCgroups" is misnomer, as this is set to false on modern rootless environments (cgroup v2 + systemd).

Should be renamed to something else in a separate PR.

@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error {
// leftover processes. Handle this special case here.
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
if c.config.RootlessCgroups { // may not have an access to cgroup
return c.signal(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalAllProcesses has a cgroupv1-friendly fallback, what is missing from there? Maybe the real issue is that we do m.Exists() in signalAllProcesses (I guess that's where the error is coming from?)?

@kolyshkin wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalAllProcesses has a cgroupv1-friendly fallback

No when lacking an access to the cgroup.
It just returns ErrNotRunning immediately :

// signalAllProcesses freezes then iterates over all the processes inside the
// manager's cgroups sending the signal s to them.
func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {
if !m.Exists() {
return ErrNotRunning
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it seems correct that m.Exists() returns false from the perspective of m, as the cgroup actually does not exist in this case

Copy link
Member

@cyphar cyphar Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the function name doesn't imply that it's cgroup-specific (though it kind of is). Idk...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalAllProcesses is only called when shared pid ns is used (as otherwise it's enough to kill pid 1), and to know all the PIDs we need a dedicated cgroup. Meaning, that rootless container with shared pidns won't work, as there is no way to find all PIDs (well, theoretically, we can find all the children of container init in this case, but I am not sure it's a good idea).

Not sure how it works in runc 1.1 -- most probably it just doesn't (i.e. we leak processes).

@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error {
// leftover processes. Handle this special case here.
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
if c.config.RootlessCgroups { // may not have an access to cgroup
return c.signal(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, for a container with shared pid namespace, just only kill the init process can cause all other processes in this container killed? If not, these non-init processes will be leaked after we delete the container. Maybe these non-init processes are managed by downstream tools(containerd or rootlesskit)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only container init process is killed, and other processes are leaked until the pidns init process is killed.

The leaked processes are not managed by anything, so it is still highly recommended to use cgroup v2 systemd delegation.


runc run -d --console-socket "$CONSOLE_SOCKET" attached_ctr
[ "$status" -eq 0 ]
testcontainer attached_ctr running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please exec another process to this container? For example:

runc exec -d attached_ctr sleep infinity

But I think it's hard to simulate the situation that runc can't access the cgroup path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And should detect whether the second process has been killed or not after we kill the container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second process is just leaked when cgroup is not delegated, as in v1.1.

The scope of the PR is limited to just fix the regression #4394.

It is still quite hard to handle the leaked processes in a robust way, and it should be discussed in a separate issue.
(Probably we should walk the procfs tree to track descendants of the container init process)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container init might be already gone (see #4102).

@lifubang
Copy link
Member

lifubang commented Sep 5, 2024

signalAllProcesses() depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.

Maybe runc kill -a container_id KILL can't kill this type container either in version v1.1.*?

@AkihiroSuda
Copy link
Member Author

signalAllProcesses() depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.

Maybe runc kill -a container_id KILL can't kill this type container either in version v1.1.*?

Seems so.

@@ -44,6 +44,7 @@ func destroy(c *Container) error {
// and destroy is supposed to remove all the container resources, we need
// to kill those processes here.
if !c.config.Namespaces.IsPrivate(configs.NEWPID) {
// Likely to fail when c.config.RootlessCgroups is true
_ = signalAllProcesses(c.cgroupManager, unix.SIGKILL)
}
if err := c.cgroupManager.Destroy(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, if runc has no access to the cgroup path, this cgroup destroy call will throw the error either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destroy seems to ignore ENOENT

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related:

I think this is adequate given the circumstances, but the commit message and the code needs to be amended to explain that in this case we leak processes (maybe add a warning, too).

I will open a separate PR to warn about such configuration.

`signalAllProcesses()` depends on the cgroup and is expected to fail
when runc is running in rootless without an access to the cgroup.

When `RootlessCgroups` is set to `true`, runc just ignores the error
from `signalAllProcesses` and may leak some processes running.
(See the comments in PR 4395)
In the future, runc should walk the process tree to avoid such a leak.

Note that `RootlessCgroups` is a misnomer; it is set to `false` despite
the name when cgroup v2 delegation is configured.
This is expected to be renamed in a separate commit.

Fix issue 4394

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda
Copy link
Member Author

I think this is adequate given the circumstances, but the commit message and the code needs to be amended to explain that in this case we leak processes (maybe add a warning, too).

Done. Added a warning too.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@kolyshkin thanks for all the related issues/PRs, it really helps to review this :)

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit f9f5764 into opencontainers:main Sep 11, 2024
42 checks passed
@@ -388,11 +388,21 @@ func (c *Container) Signal(s os.Signal) error {
// leftover processes. Handle this special case here.
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
if c.config.RootlessCgroups { // may not have an access to cgroup
logrus.WithError(err).Warn("failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation)")
Copy link
Member

@lifubang lifubang Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuold we remove .WithError(err) here? Because for kill a such type running container, it will log with an error: `failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation) error=container not running". It will make the user confused.

(I'm sorry to say it after merged, because I'm in a busy state these days.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other types of errors it still makes sense to print the err?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang lifubang mentioned this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants