Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: mkdir /mnt/persistent/var/lib/containers/overlay/l: file exists #24900

Open
Austinpayne opened this issue Dec 23, 2024 · 0 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Austinpayne
Copy link

Austinpayne commented Dec 23, 2024

Issue Description

I've run into a, very rare, issue with an embedded system we are developing. We have various containers running on the aarch64 Nvidia Jetson platform and using podman for our containerized services. After a reboot, none of our containers were running (orchestrated with systemd) and we found that running podman ps, or any other podman command, returned Error: mkdir /mnt/persistent/var/lib/containers/overlay/l: file exists. I discovered that the overlay/l "directory" had actually been turned into a symlink instead of being a directory like I've come to expect it to be. It was a symlink to one of the diff layers, exactly like the symlinks that are normally in the overlay/l directory.

Has anyone seen this before? I ran a full fsck on the drive and tested it with smartctl, both of which came back clean. I don't suspect a disk failure. I was able to recover from this situation by simply removing the overlay/l symlink and re-run any podman command, which recreated the directory as expected and it began populating with symlinks like normal.

Note: I realize 3.4.4 is quite an old version of podman at this point. We plan to upgrade but are blocked behind some other planned upgrades (e.g. kernel 6.0+, newer version of Ubuntu). I'm mainly wanting to see if anyone has any insights into why this happened or if others have experienced something similar. For the short term, I have implemented a mechanism to detect and correct this situation.

Steps to reproduce the issue

This is a forced reproducer, I have only seen it occur organically one time in the field (after a simply sudo reboot)

Steps to reproduce the issue

  1. Stop all containers
  2. sudo rm -rf /mnt/persistent/var/lib/containers/overlay/l
  3. sudo touch /mnt/persistent/var/lib/containers/overlay/l
  4. Run any podman command, e.g. sudo podman ps and observe the error: Error: mkdir /mnt/persistent/var/lib/containers/overlay/l: file exists

Describe the results you received

Podman appears to have corrupted the overlay/l directory and converted it to a symlink, resulting in the error Error: mkdir /mnt/persistent/var/lib/containers/overlay/l: file exists being thrown for any podman command.

Describe the results you expected

I would expect that podman would not corrupt the overlay/l directory to be a symlink and/or file instead of a directory. Barring that, I would expect it could detect such a situation and correct for it.

podman info output

host:
  arch: arm64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 8
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: redacted
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.136-tegra
  linkmode: dynamic
  logDriver: journald
  memFree: 12095455232
  memTotal: 16417763328
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 2h 2m 52.54s (Approximately 0.08 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 25
    paused: 0
    running: 13
    stopped: 12
  graphDriverName: overlay
  graphOptions:
    overlay.ignore_chown_errors: "true"
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /mnt/persistent/var/lib/containers
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 20
  runRoot: /run/containers/storage
  volumePath: /mnt/persistent/var/lib/containers/volumes
version:
  APIVersion: 3.4.4
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.18.1
  OsArch: linux/arm64
  Version: 3.4.4

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

I am using a custom graphroot on a separate partition in /etc/containers/storage.conf

graphroot = "/mnt/persistent/var/lib/containers"

Additional information

I have only seen this happen once in the field, after a simple sudo reboot. I am able to induce the failure with the reproducer provided.

@Austinpayne Austinpayne added the kind/bug Categorizes issue or PR as related to a bug. label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant