Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-oomd making questionable choices #32687

Open
yshui opened this issue May 7, 2024 · 1 comment
Open

systemd-oomd making questionable choices #32687

yshui opened this issue May 7, 2024 · 1 comment
Labels
bug 🐛 Programming errors, that need preferential fixing oomd

Comments

@yshui
Copy link
Contributor

yshui commented May 7, 2024

systemd version the issue has been seen with

255.4

Used distribution

NixOS

Linux kernel version used

6.8.9

CPU architectures issue was seen on

x86_64

Component

systemd-oomd

Expected behaviour you didn't see

oomd kills and only kills high memory usage cgroups, and nothing more.

Unexpected behaviour you saw

oomd makes seemingly arbitrary choices. As you can see in the output attached, despite nix-daemon using a large amount of memory, and had a high Pgscan, systemd-resolved was killed for some reason. It also killed more process than it reported (probably the same problem as #32304), and when it did kill process from other cgroups, it failed to kill them all.

Steps to reproduce the problem

Set oom pressure limit, and hit the limit.

Additional program output to the terminal or log subsystem illustrating the issue

May 07 14:29:23 systemd-oomd[1852]: Considered 50 cgroups for killing, top candidates were:
May 07 14:29:23 systemd-oomd[1852]:         Path: /system.slice/nix-daemon.service
May 07 14:29:23 systemd-oomd[1852]:                 Memory Pressure Limit: 0.00%
May 07 14:29:23 systemd-oomd[1852]:                 Pressure: Avg10: 85.87 Avg60: 73.87 Avg300: 50.24 Total: 6min 30s
May 07 14:29:23 systemd-oomd[1852]:                 Current Memory Usage: 43.3G
May 07 14:29:23 systemd-oomd[1852]:                 Memory Min: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Memory Low: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Pgscan: 169344875
May 07 14:29:23 systemd-oomd[1852]:                 Last Pgscan: 169314859
May 07 14:29:23 systemd-oomd[1852]:         Path: /system.slice/docker.service
May 07 14:29:23 systemd-oomd[1852]:                 Memory Pressure Limit: 0.00%
May 07 14:29:23 systemd-oomd[1852]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 103ms
May 07 14:29:23 systemd-oomd[1852]:                 Current Memory Usage: 31.4M
May 07 14:29:23 systemd-oomd[1852]:                 Memory Min: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Memory Low: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Pgscan: 165280
May 07 14:29:23 systemd-oomd[1852]:                 Last Pgscan: 165280
May 07 14:29:23 systemd-oomd[1852]:         Path: /system.slice/display-manager.service
May 07 14:29:23 systemd-oomd[1852]:                 Memory Pressure Limit: 0.00%
May 07 14:29:23 systemd-oomd[1852]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 4s
May 07 14:29:23 systemd-oomd[1852]:                 Current Memory Usage: 21.9M
May 07 14:29:23 systemd-oomd[1852]:                 Memory Min: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Memory Low: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Pgscan: 63273
May 07 14:29:23 systemd-oomd[1852]:                 Last Pgscan: 63273
May 07 14:29:23 systemd-oomd[1852]:         Path: /system.slice/fwupd.service
May 07 14:29:23 systemd-oomd[1852]:                 Memory Pressure Limit: 0.00%
May 07 14:29:23 systemd-oomd[1852]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
May 07 14:29:23 systemd-oomd[1852]:                 Current Memory Usage: 13.7M
May 07 14:29:23 systemd-oomd[1852]:                 Memory Min: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Memory Low: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Pgscan: 2597
May 07 14:29:23 systemd-oomd[1852]:                 Last Pgscan: 2597
May 07 14:29:23 systemd-oomd[1852]:         Path: /system.slice/systemd-resolved.service
May 07 14:29:23 systemd-oomd[1852]:                 Memory Pressure Limit: 0.00%
May 07 14:29:23 systemd-oomd[1852]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 49ms
May 07 14:29:23 systemd-oomd[1852]:                 Current Memory Usage: 3.1M
May 07 14:29:23 systemd-oomd[1852]:                 Memory Min: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Memory Low: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Pgscan: 15924
May 07 14:29:23 systemd-oomd[1852]:                 Last Pgscan: 15924
May 07 14:29:23 systemd-oomd[1852]:         Path: /system.slice/udisks2.service
May 07 14:29:23 systemd-oomd[1852]:                 Memory Pressure Limit: 0.00%
May 07 14:29:23 systemd-oomd[1852]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
May 07 14:29:23 systemd-oomd[1852]:                 Current Memory Usage: 2.4M
May 07 14:29:23 systemd-oomd[1852]:                 Memory Min: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Memory Low: 0B
May 07 14:29:23 systemd-oomd[1852]:                 Pgscan: 3994
May 07 14:29:23 systemd-oomd[1852]:                 Last Pgscan: 3994
May 07 14:29:23 systemd-oomd[1852]: Killed /system.slice/systemd-resolved.service due to memory pressure for /system.slice being 86.39% > 80.00% for > 20s with reclaim activity

......

May 07 14:29:23 systemd[1]: nix-daemon.service: systemd-oomd killed some process(es) in this unit.
May 07 14:29:23 systemd[1]: nix-daemon.service: Main process exited, code=killed, status=9/KILL
May 07 14:29:23 systemd[1]: nix-daemon.service: Failed with result 'signal'.
May 07 14:29:23 systemd[1]: nix-daemon.service: Unit process 8776 (nix-daemon) remains running after unit stopped.
May 07 14:29:23 systemd[1]: nix-daemon.service: Unit process 239950 (ninja) remains running after unit stopped.
May 07 14:29:23 systemd[1]: nix-daemon.service: Unit process 649113 (cc1plus) remains running after unit stopped.
May 07 14:29:23 systemd[1]: nix-daemon.service: Unit process 687905 (ld) remains running after unit stopped.
May 07 14:29:23 systemd[1]: nix-daemon.service: Unit process 697609 (cc1plus) remains running after unit stopped.
May 07 14:29:23 systemd[1]: nix-daemon.service: Unit process 697614 (cc1plus) remains running after unit stopped.
...... many similar lines omitted ......
May 07 14:29:23 systemd[1]: nix-daemon.service: Consumed 14h 36min 25.672s CPU time, 43.4G memory peak, 39.8G memory swap peak, read 31.8G from disk, written 86.6G to disk, no IP traffic.
May 07 14:29:23 systemd[1]: display-manager.service: systemd-oomd killed some process(es) in this unit.
May 07 14:29:23 systemd[1]: display-manager.service: Main process exited, code=killed, status=9/KILL
May 07 14:29:23 systemd[1]: display-manager.service: Failed to kill control group /system.slice/display-manager.service, ignoring: Invalid argument
May 07 14:29:23 systemd[1]: display-manager.service: Failed with result 'oom-kill'.
May 07 14:29:23 systemd[1]: display-manager.service: Consumed 57.857s CPU time, 98.2M memory peak, 33.2M memory swap peak, read 62.8M from disk, written 242.3M to disk, no IP traffic.
May 07 14:29:23 systemd[1]: fwupd.service: systemd-oomd killed some process(es) in this unit.
May 07 14:29:23 systemd[1]: fwupd.service: Killing process 718588 (.fwupd-wrapped) with signal SIGKILL.
May 07 14:29:23 systemd[1]: fwupd.service: Main process exited, code=killed, status=9/KILL
May 07 14:29:23 systemd[1]: fwupd.service: Failed with result 'oom-kill'.
May 07 14:29:23 systemd[1]: docker.service: systemd-oomd killed some process(es) in this unit.
May 07 14:29:23 systemd[1]: docker.service: Main process exited, code=killed, status=9/KILL
May 07 14:29:23 systemd[1]: docker.service: Failed with result 'signal'.
May 07 14:29:23 systemd[1]: docker.service: Consumed 3.023s CPU time, 103.2M memory peak, 0B memory swap peak, no IP traffic.
@yshui yshui added the bug 🐛 Programming errors, that need preferential fixing label May 7, 2024
@github-actions github-actions bot added the oomd label May 7, 2024
@yshui
Copy link
Contributor Author

yshui commented May 7, 2024

Hypothesis: oomd tried to kill the higher ranked cgroup, it successfully killed some processes in the cgroup but not all of them. oomd thus concluded killing had failed and then moved on to the next one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing oomd
Development

No branches or pull requests

1 participant