Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: use Linux's value for sysctl_nr_open_max
While Lennart moaned, a bit, in the comments about vendoring this expression and this value being unknowable, it hasn't changed in over 10 years (and even then the change was a refactor to define this value statically instead of defining it via delayed execution, cf. 7f4b36f9bb930b3b2105a9a2cb0121fa7028c432 of the kernel source) so it could be argued that it's pretty knowable (even if it isn't available through public kernel headers like it might ought to be). The previous implementation of `bump_file_max_and_nr_open` isn't great, since the initial value is a bit higher than the maximum allowed value by the kernel which triggers a cut down to half of the initial value. We ran into this in production while trying to increase system resource limits and found that PID 1 wasn't respecting our sysctl.nr_file values. Enabling debug logs using `systemd.log_level=debug` we observed ``` May 10 02:46:45 localhost systemd[1]: systemd 251.16 running in system mode (+PAM +AUDIT -SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 > May 10 02:46:45 localhost systemd[1]: Detected virtualization amazon. May 10 02:46:45 localhost systemd[1]: Detected architecture x86-64. May 10 02:46:45 localhost systemd[1]: Detected initialized system, this is not the first boot. May 10 02:46:45 localhost systemd[1]: Kernel version 5.15.119, our baseline is 4.15 May 10 02:46:45 localhost systemd[1]: No hostname configured, using default hostname. May 10 02:46:45 localhost systemd[1]: Hostname set to <localhost>. May 10 02:46:45 localhost systemd[1]: Successfully added address 127.0.0.1 to loopback interface May 10 02:46:45 localhost systemd[1]: Successfully added address ::1 to loopback interface May 10 02:46:45 localhost systemd[1]: Successfully brought loopback interface up May 10 02:46:45 localhost systemd[1]: Setting '/proc/sys/fs/file-max' to '9223372036854775807' May 10 02:46:45 localhost systemd[1]: Initial value of v is 2147483584. May 10 02:46:45 localhost systemd[1]: Setting '/proc/sys/fs/nr_open' to '2147483584' May 10 02:46:45 localhost systemd[1]: Successfully bumped fs.nr_open to 2147483584 May 10 02:46:45 localhost systemd[1]: No credentials passed via fw_cfg. ``` including this change while we observed ``` May 10 03:31:37 localhost systemd[1]: systemd 251.16 running in system mode (+PAM +AUDIT -SELINUX +APPARMOR +IMA > May 10 03:31:37 localhost systemd[1]: Detected virtualization amazon. May 10 03:31:37 localhost systemd[1]: Detected architecture x86-64. May 10 03:31:37 localhost systemd[1]: Detected initialized system, this is not the first boot. May 10 03:31:37 localhost systemd[1]: Kernel version 5.15.119, our baseline is 4.15 May 10 03:31:37 localhost systemd[1]: No hostname configured, using default hostname. May 10 03:31:37 localhost systemd[1]: Hostname set to <localhost>. May 10 03:31:37 localhost systemd[1]: Successfully added address 127.0.0.1 to loopback interface May 10 03:31:37 localhost systemd[1]: Successfully added address ::1 to loopback interface May 10 03:31:37 localhost systemd[1]: Successfully brought loopback interface up May 10 03:31:37 localhost systemd[1]: Setting '/proc/sys/fs/file-max' to '9223372036854775807' May 10 03:31:37 localhost systemd[1]: Setting '/proc/sys/fs/nr_open' to '2147483640' May 10 03:31:37 localhost systemd[1]: Couldn't write fs.nr_open as 2147483640, halving it. May 10 03:31:37 localhost systemd[1]: Setting '/proc/sys/fs/nr_open' to '1073741816' May 10 03:31:37 localhost systemd[1]: Successfully bumped fs.nr_open to 1073741816 May 10 03:31:37 localhost systemd[1]: No credentials passed via fw_cfg. ``` without this patch. We also observed ``` $ cat /proc/1/limits | grep files Max open files 2147483584 2147483584 files ``` with this patch, but ``` $ cat /proc/1/limits | grep files Max open files 1073741816 1073741816 files ``` without this patch. Below is a short program that anyone can use to determine what the value of `sysctl_nr_open_max` should be for their architecture. ```cpp // Type your code here, or load an example. /* sysctl_nr_open_max defines the largest value for which you can do something like ``` $ sysctl -w fs.nr_open=10000 fs.nr_open = 10000 ``` The value is apparently architecture-dependent and is located at https://github.com/torvalds/linux/blob/448b3fe5a0eab5b625a7e15c67c7972169e47ff8/fs/file.c#L27-L32 This little snippet, below, yields this value for different architectures. It's mostly an experiment with using Godbolt (C++ not C) to confirm certain Linux results. */ // cf. https://stackoverflow.com/a/66249936 extern "C" { const char *getBuild() { // Get current architecture, detectx nearly every // architecture. Coded by Freak return "x86_64"; return "x86_32"; return "ARM2"; return "ARM3"; return "ARM4T"; return "ARM5" return "ARM6T2"; defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || \ defined(__ARM_ARCH_6ZK__) return "ARM6"; defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || \ defined(__ARM_ARCH_7S__) return "ARM7"; defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__) return "ARM7A"; defined(__ARM_ARCH_7S__) return "ARM7R"; return "ARM7M"; return "ARM7S"; return "ARM64"; return "MIPS"; return "SUPERH"; defined(__POWERPC__) || defined(__ppc__) || defined(__PPC__) || \ defined(_ARCH_PPC) return "POWERPC"; return "POWERPC64"; return "SPARC"; return "M68K"; return "UNKNOWN"; } } // __WORDSIZE defined by GCC/llvm void fn() { const char *system = getBuild(); std::printf("==== Results for %s ====\n\n", system); std::printf("INT_MAX (%s): %x (%d)\n", system, INT_MAX, INT_MAX); unsigned long native_size = ~(size_t)0 / sizeof(void *); std::printf("~(size_t)0/sizeof(void *): %lx (%ld)\n", native_size, native_size); unsigned long min = __const_min(INT_MAX, ~(size_t)0 / sizeof(void *)); std::printf("Minimum of those two things: %lx (%ld)\n", min, min); std::cout << "BITS_PER_LONG: " << BITS_PER_LONG << std::endl; unsigned int sysctl_nr_open_max = __const_min(INT_MAX, ~(size_t)0 / sizeof(void *)) & -BITS_PER_LONG; printf("sysctl_nr_open_max: %d", sysctl_nr_open_max); } int main() { fn(); return 0; } ```
- Loading branch information