Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When dev path is changed, Instance is updated with new path but the pod using the device still sees the old path #734

Open
muvaf opened this issue Jan 6, 2025 · 0 comments · May be fixed by #733
Labels
bug Something isn't working

Comments

@muvaf
Copy link

muvaf commented Jan 6, 2025

Describe the bug

When I plug an iPhone, I see the Instance resource created for it and the first thing my controller watching Instance resource is to send a magic byte to enable CDC NCM network interface which is a USB config change that results in unplug/plug of the device and the dev path changes.

After that process, I see that the same Instance resource now has a different dev path, eg /dev/bus/usb/001/007 becomes /dev/bus/usb/001/009. However, the Pod that has the Instance name under its requests, still sees the old path as the only USB device, hence reading that old file results in no such file errors.

When I restart all the Akri pods, the Pod sees the new path and works. What I'm guessing is that, somehow, kubelet is not updated with the new path of the Instance and since restart results in re-registration of the plugin, it goes through all the devices and ends up registering the new path.

Output of kubectl get pods,akrii,akric -o wide

> kubectl get pods,akrii,akric -o wide -n akri-system
NAME                                              READY   STATUS    RESTARTS       AGE   IP             NODE            NOMINATED NODE   READINESS GATES
pod/akri-agent-daemonset-jc596                    1/1     Running   1 (5d1h ago)   9d    10.244.0.127   talos-aqs-r1t   <none>           <none>
pod/akri-controller-deployment-745f4bfc4c-b2fnk   1/1     Running   1 (5d1h ago)   9d    10.244.0.124   talos-aqs-r1t   <none>           <none>
pod/akri-udev-discovery-daemonset-g4q5f           1/1     Running   1 (5d1h ago)   9d    10.244.0.129   talos-aqs-r1t   <none>           <none>
pod/akri-webhook-configuration-78666f968d-pfvpk   1/1     Running   1 (5d1h ago)   9d    10.244.0.128   talos-aqs-r1t   <none>           <none>

Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s]
1.32.0 via Talos 1.9

To Reproduce

I will try to get a script going to be able to cleanly reproduce this but it'll have to require iOS hardware.

Expected behavior

When dev path changes in Instance as a result of USB config change, the change should be reflected in the pod(s) that uses the Instance as well and the processes trying to access the USB path should see the new path and be able to read/write.

Logs (please share snips of applicable logs)

Here is the initial Instance resource:

apiVersion: akri.sh/v0
kind: Instance
metadata:
  creationTimestamp: "2025-01-08T10:43:37Z"
  generation: 1
  name: device-pod-ios-agent-1afb6f
  namespace: mobile-device-system
  ownerReferences:
  - apiVersion: akri.sh/v0
    controller: true
    kind: Configuration
    name: device-pod-ios-agent
    uid: d68db8d0-064f-4514-b805-8888241be5d5
  resourceVersion: "2124065"
  uid: 68808b9f-1c85-451e-a8a2-03eff26b1d06
spec:
  brokerProperties:
    UDEV_DEVNODE_0: /dev/bus/usb/001/019
    UDEV_DEVPATH: /devices/pci0000:00/0000:00:14.0/usb1/1-9
  capacity: 1
  cdiName: akri.sh/device-pod-ios-agent=1afb6f
  configurationName: device-pod-ios-agent
  deviceUsage: {}
  nodes:
  - talos-aqs-r1t
  shared: false

After sending the USB config bytes, here is how it's changed by Akri:

apiVersion: akri.sh/v0
kind: Instance
metadata:
  creationTimestamp: "2025-01-08T10:43:37Z"
  finalizers:
  - talos-aqs-r1t
  generation: 3
  labels:
    attributes.platform.qawolf.com/usb-network-enabled: "true"
  name: device-pod-ios-agent-1afb6f
  namespace: mobile-device-system
  ownerReferences:
  - apiVersion: akri.sh/v0
    controller: true
    kind: Configuration
    name: device-pod-ios-agent
    uid: d68db8d0-064f-4514-b805-8888241be5d5
  resourceVersion: "2124125"
  uid: 68808b9f-1c85-451e-a8a2-03eff26b1d06
spec:
  brokerProperties:
    UDEV_DEVNODE_0: /dev/bus/usb/001/020
    UDEV_DEVPATH: /devices/pci0000:00/0000:00:14.0/usb1/1-9
  capacity: 1
  cdiName: akri.sh/device-pod-ios-agent=1afb6f
  configurationName: device-pod-ios-agent
  deviceUsage:
    device-pod-ios-agent-1afb6f-0: talos-aqs-r1t
  nodes:
  - talos-aqs-r1t
  shared: false

My controller puts the attributes.platform.qawolf.com/usb-network-enabled: "true" only after it confirms that the device is in the correct config, e.g. after it changes to the new dev path. Then a Pod is created only if that label is present.

Here is how the Pod events look like:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Normal   Scheduled         4m                   default-scheduler  Successfully assigned mobile-device-system/device-pod-ios-agent-1afb6f-1dsbg to talos-aqs-r1t
  Warning  FailedScheduling  4m10s                default-scheduler  0/1 nodes are available: 1 Insufficient akri.sh/device-pod-ios-agent-1afb6f. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Warning  Failed            4m                   kubelet            Error: failed to generate container "660f4629cd79181d29c9e1c0e94fa9c8d2de6b859393827021dec086ceca1f65" spec: failed to apply OCI options: lstat /dev/bus/usb/001/019: no such file or directory
  Warning  Failed            3m59s                kubelet            Error: failed to generate container "61bdc29ac4676c714c866e6a8b4c866e27ddc03876b0c0f232d9d8cf4e28be29" spec: failed to apply OCI options: lstat /dev/bus/usb/001/019: no such file or directory
  Warning  Failed            3m45s                kubelet            Error: failed to generate container "ffdc0446e6bb6b701315896f085d369a435cce13c878e78f24d9a77106ff0b60" spec: failed to apply OCI options: lstat /dev/bus/usb/001/019: no such file or directory

Here is the excerpt of the agent logs:

[2025-01-08T10:43:34Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:43:34Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:43:37Z TRACE agent::discovery_handler_manager::registration_socket] Received new message from discovery handler: DiscoverResponse { devices: [Device { id: "/devices/pci0000:00/0000:00:14.0/usb1/1-9", properties: {"UDEV_DEVPATH": "/devices/pci0000:00/0000:00:14.0/usb1/1-9", "UDEV_DEVNODE_0": "/dev/bus/usb/001/019"}, mounts: [], device_specs: [DeviceSpec { container_path: "/dev/bus/usb/001/019", host_path: "/dev/bus/usb/001/019", permissions: "rwm" }] }] }
[2025-01-08T10:43:37Z TRACE agent::discovery_handler_manager::discovery_handler_registry] Ask for reconciliation of mobile-device-system::device-pod-ios-agent
[2025-01-08T10:43:37Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:43:37Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:43:37Z INFO  agent::plugin_manager::device_plugin_runner] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/device-pod-ios-agent-1afb6f-1736333017.sock
[2025-01-08T10:43:38Z INFO  agent::plugin_manager::device_plugin_runner] register - entered for Instance akri.sh/device-pod-ios-agent-1afb6f and socket_name: device-pod-ios-agent-1afb6f-1736333017.sock
[2025-01-08T10:43:38Z TRACE agent::plugin_manager::device_plugin_runner] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2025-01-08T10:43:38Z INFO  agent::plugin_manager::device_plugin_runner] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/device-pod-ios-agent-1736333018.sock
[2025-01-08T10:43:38Z INFO  agent::plugin_manager::device_plugin_instance_controller] list_and_watch - kubelet called list_and_watch for instance device-pod-ios-agent-1afb6f
[2025-01-08T10:43:38Z TRACE agent::plugin_manager::device_plugin_instance_controller] Sending devices to kubelet: [Device { id: "device-pod-ios-agent-1afb6f-0", health: "Healthy", topology: None }]
[2025-01-08T10:43:39Z INFO  agent::plugin_manager::device_plugin_runner] register - entered for Instance akri.sh/device-pod-ios-agent and socket_name: device-pod-ios-agent-1736333018.sock
[2025-01-08T10:43:39Z TRACE agent::plugin_manager::device_plugin_runner] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2025-01-08T10:43:39Z INFO  agent::plugin_manager::device_plugin_instance_controller] list_and_watch - kubelet called list_and_watch for Configuration device-pod-ios-agent
[2025-01-08T10:43:39Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:43:44Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:43:44Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:43:46Z TRACE agent::plugin_manager::device_plugin_runner] kubelet called allocate Request { metadata: MetadataMap { headers: {"content-type": "application/grpc", "user-agent": "grpc-go/1.65.0", "te": "trailers", "grpc-accept-encoding": "gzip"} }, message: AllocateRequest { container_requests: [ContainerAllocateRequest { devices_i_ds: ["device-pod-ios-agent-1afb6f-0"] }] }, extensions: Extensions }
[2025-01-08T10:43:46Z INFO  agent::plugin_manager::device_plugin_instance_controller] allocate - kubelet called allocate for Instance device-pod-ios-agent-1afb6f
[2025-01-08T10:43:46Z TRACE agent::plugin_manager::device_plugin_instance_controller] Sending devices to kubelet: [Device { id: "device-pod-ios-agent-1afb6f-0", health: "Healthy", topology: None }]
[2025-01-08T10:43:46Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:43:46Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:43:54Z TRACE agent::discovery_handler_manager::registration_socket] Received new message from discovery handler: DiscoverResponse { devices: [Device { id: "/devices/pci0000:00/0000:00:14.0/usb1/1-9", properties: {"UDEV_DEVNODE_0": "/dev/bus/usb/001/020", "UDEV_DEVPATH": "/devices/pci0000:00/0000:00:14.0/usb1/1-9"}, mounts: [], device_specs: [DeviceSpec { container_path: "/dev/bus/usb/001/020", host_path: "/dev/bus/usb/001/020", permissions: "rwm" }] }] }
[2025-01-08T10:43:54Z TRACE agent::discovery_handler_manager::discovery_handler_registry] Ask for reconciliation of mobile-device-system::device-pod-ios-agent
[2025-01-08T10:43:54Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:43:54Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:43:54Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:43:54Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:44:04Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:44:04Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:44:14Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:44:14Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:44:14Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:44:14Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s
[2025-01-08T10:44:24Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:44:24Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:44:24Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:44:24Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s
[2025-01-08T10:44:34Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:44:34Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:44:34Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:44:34Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s
[2025-01-08T10:44:44Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:44:44Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock

Note that deleting the pod and re-creating it does not make a difference. However, if I delete the agent pod and let the new one come up, it all starts to work. Here is the agent logs after a new one comes up:

akri.sh Agent start
akri.sh KUBERNETES_PORT found ... env_logger::init
[2025-01-08T10:50:00Z TRACE agent] akri.sh KUBERNETES_PORT found ... env_logger::init finished
[2025-01-08T10:50:00Z INFO  akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics
[2025-01-08T10:50:00Z INFO  agent::discovery_handler_manager::registration_socket] internal_run_registration_server - entered
[2025-01-08T10:50:00Z TRACE agent::discovery_handler_manager::registration_socket] internal_run_registration_server - registration server listening on socket /var/lib/akri/agent-registration.sock
[2025-01-08T10:50:00Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:50:00Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:50:00Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:00Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:50:00Z WARN  agent::plugin_manager::device_plugin_instance_controller] Error during reconciliation of Instance Some("mobile-device-system")::device-pod-ios-agent-1afb6f, retrying in 1s: UnknownDevice("akri.sh/device-pod-ios-agent=1afb6f")
[2025-01-08T10:50:00Z WARN  agent::util::discovery_configuration_controller] Error during reconciliation for Some("mobile-device-system")::device-pod-ios-agent, retrying in 1s: DiscoveryError(NoHandler("udev"))
[2025-01-08T10:50:01Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:50:01Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:01Z WARN  agent::util::discovery_configuration_controller] Error during reconciliation for Some("mobile-device-system")::device-pod-ios-agent, retrying in 2s: DiscoveryError(NoHandler("udev"))
[2025-01-08T10:50:01Z WARN  agent::plugin_manager::device_plugin_instance_controller] Error during reconciliation of Instance Some("mobile-device-system")::device-pod-ios-agent-1afb6f, retrying in 2s: UnknownDevice("akri.sh/device-pod-ios-agent=1afb6f")
[2025-01-08T10:50:03Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:03Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:50:03Z WARN  agent::plugin_manager::device_plugin_instance_controller] Error during reconciliation of Instance Some("mobile-device-system")::device-pod-ios-agent-1afb6f, retrying in 4s: UnknownDevice("akri.sh/device-pod-ios-agent=1afb6f")
[2025-01-08T10:50:03Z WARN  agent::util::discovery_configuration_controller] Error during reconciliation for Some("mobile-device-system")::device-pod-ios-agent, retrying in 4s: DiscoveryError(NoHandler("udev"))
[2025-01-08T10:50:07Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:50:07Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:07Z WARN  agent::plugin_manager::device_plugin_instance_controller] Error during reconciliation of Instance Some("mobile-device-system")::device-pod-ios-agent-1afb6f, retrying in 8s: UnknownDevice("akri.sh/device-pod-ios-agent=1afb6f")
[2025-01-08T10:50:07Z TRACE agent::discovery_handler_manager::registration_socket] NetworkEndpoint::query - connecting to external udev discovery handler over network
[2025-01-08T10:50:07Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:10Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:50:10Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:50:15Z TRACE agent::discovery_handler_manager::registration_socket] Received new message from discovery handler: DiscoverResponse { devices: [Device { id: "/devices/pci0000:00/0000:00:14.0/usb1/1-9", properties: {"UDEV_DEVNODE_0": "/dev/bus/usb/001/020", "UDEV_DEVPATH": "/devices/pci0000:00/0000:00:14.0/usb1/1-9"}, mounts: [], device_specs: [DeviceSpec { container_path: "/dev/bus/usb/001/020", host_path: "/dev/bus/usb/001/020", permissions: "rwm" }] }] }
[2025-01-08T10:50:15Z TRACE agent::discovery_handler_manager::discovery_handler_registry] Ask for reconciliation of mobile-device-system::device-pod-ios-agent
[2025-01-08T10:50:15Z TRACE agent::util::discovery_configuration_controller] Reconciling Some("mobile-device-system")::device-pod-ios-agent
[2025-01-08T10:50:15Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:15Z INFO  agent::plugin_manager::device_plugin_runner] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/device-pod-ios-agent-1afb6f-1736333415.sock
[2025-01-08T10:50:16Z INFO  agent::plugin_manager::device_plugin_runner] register - entered for Instance akri.sh/device-pod-ios-agent-1afb6f and socket_name: device-pod-ios-agent-1afb6f-1736333415.sock
[2025-01-08T10:50:16Z TRACE agent::plugin_manager::device_plugin_runner] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2025-01-08T10:50:16Z INFO  agent::plugin_manager::device_plugin_runner] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/device-pod-ios-agent-1736333416.sock
[2025-01-08T10:50:16Z INFO  agent::plugin_manager::device_plugin_instance_controller] list_and_watch - kubelet called list_and_watch for instance device-pod-ios-agent-1afb6f
[2025-01-08T10:50:16Z TRACE agent::plugin_manager::device_plugin_instance_controller] Sending devices to kubelet: [Device { id: "device-pod-ios-agent-1afb6f-0", health: "Healthy", topology: None }]
[2025-01-08T10:50:17Z INFO  agent::plugin_manager::device_plugin_runner] register - entered for Instance akri.sh/device-pod-ios-agent and socket_name: device-pod-ios-agent-1736333416.sock
[2025-01-08T10:50:17Z TRACE agent::plugin_manager::device_plugin_runner] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2025-01-08T10:50:17Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:17Z INFO  agent::plugin_manager::device_plugin_instance_controller] list_and_watch - kubelet called list_and_watch for Configuration device-pod-ios-agent
[2025-01-08T10:50:20Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:50:20Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:50:23Z TRACE agent::plugin_manager::device_plugin_runner] kubelet called allocate Request { metadata: MetadataMap { headers: {"content-type": "application/grpc", "user-agent": "grpc-go/1.65.0", "te": "trailers", "grpc-accept-encoding": "gzip"} }, message: AllocateRequest { container_requests: [ContainerAllocateRequest { devices_i_ds: ["device-pod-ios-agent-1afb6f-0"] }] }, extensions: Extensions }
[2025-01-08T10:50:23Z INFO  agent::plugin_manager::device_plugin_instance_controller] allocate - kubelet called allocate for Instance device-pod-ios-agent-1afb6f
[2025-01-08T10:50:23Z TRACE agent::plugin_manager::device_plugin_instance_controller] Sending devices to kubelet: [Device { id: "device-pod-ios-agent-1afb6f-0", health: "Healthy", topology: None }]
[2025-01-08T10:50:23Z TRACE agent::plugin_manager::device_plugin_instance_controller] Plugin Manager: Reconciling device-pod-ios-agent-1afb6f
[2025-01-08T10:50:30Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:50:30Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:50:40Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:50:40Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:50:50Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:50:50Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:50:50Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:50:50Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s
[2025-01-08T10:51:00Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:51:00Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:51:00Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:51:00Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s
[2025-01-08T10:51:10Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:51:10Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:51:10Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:51:10Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s
[2025-01-08T10:51:20Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] reclaiming unused slots - start
[2025-01-08T10:51:20Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] register - before call to register with the kubelet at socket /var/lib/kubelet/pod-resources/kubelet.sock
[2025-01-08T10:51:20Z TRACE agent::plugin_manager::device_plugin_slot_reclaimer] freeing slot: akri.sh/device-pod-ios-agent-1afb6f-0
[2025-01-08T10:51:20Z WARN  agent::plugin_manager::device_plugin_slot_reclaimer] Failed to free slot akri.sh/device-pod-ios-agent-1afb6f-0, will try again in 10s

Here is the events of the new pod that's created for the Instance:

Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m20s  default-scheduler  0/1 nodes are available: 1 Insufficient akri.sh/device-pod-ios-agent-1afb6f. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Scheduled         2m12s  default-scheduler  Successfully assigned mobile-device-system/device-pod-ios-agent-1afb6f-u7uze to talos-aqs-r1t

Additional context

I'm experienced in Go but have practically zero experience in Rust. Here is a draft change that I was able to write with help from Cursor editor running with Claude 3.5 Sonnet. It's hacky but that was the only simple solution it was able to come up with. I'll update here and the PR once my testing is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Triage needed
1 participant