CPEM requires Kubernetes node name to match Equinix Metal device name #533

hh · 2024-04-17T16:40:12Z

I'm not sure where to set providerID. I don't remember setting it in the past. Any suggestions?

CPEM daemonset

kubectl  -n kube-system  describe ds cloud-provider-equinix-metal
Name:           cloud-provider-equinix-metal
Selector:       app=cloud-provider-equinix-metal
Node-Selector:  <none>
Labels:         app=cloud-provider-equinix-metal
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status:  3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=cloud-provider-equinix-metal
  Service Account:  cloud-provider-equinix-metal
  Containers:
   cloud-provider-equinix-metal:
    Image:      quay.io/equinix-oss/cloud-provider-equinix-metal:v3.8.0
    Port:       <none>
    Host Port:  <none>
    Command:
      ./cloud-provider-equinix-metal
      --cloud-provider=equinixmetal
      --leader-elect=true
      --authentication-skip-lookup=true
      --cloud-config=/etc/cloud-sa/cloud-sa.json
    Requests:
      cpu:        100m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /etc/cloud-sa from cloud-sa-volume (ro)
  Volumes:
   cloud-sa-volume:
    Type:               Secret (a volume populated by a Secret)
    SecretName:         metal-cloud-config
    Optional:           false
  Priority Class Name:  system-cluster-critical
Events:                 <none>

cloud-sa.json

kubectl  -n kube-system  get secret metal-cloud-config -o json | jq '.data["cloud-sa.json"]' -r | base64 -d | jq .
{
  "apiKey": "XXXXXXXXX",
  "projectID": "82b5c425-8dd4-429e-ae0d-d32f265c63e4",
  "metro": "sv",
  "eipTag": "eip-apiserver-sharingio",
  "eipHealthCheckUseHostIP": true,
  "loadBalancer": "metallb:///metallb-system?crdConfiguration=true"
}

CPEM logs

kubectl  -n kube-system logs ds/cloud-provider-equinix-metal | tail -10
Found 3 pods, using pod/cloud-provider-equinix-metal-bl7nh
I0417 16:37:29.152076       1 eip_controlplane_reconciliation.go:249] healthcheck node https://139.178.94.175:6443/healthz
E0417 16:37:29.157164       1 eip_controlplane_reconciliation.go:135] failed to handle node health check: failed to assign the control plane endpoint: providerID cannot be empty string
I0417 16:37:29.157191       1 eip_controlplane_reconciliation.go:125] handling update, node: shining-ant
I0417 16:37:29.389548       1 eip_controlplane_reconciliation.go:529] doHealthCheck(): no control plane IP assignment found, trying to assign to an available controlplane node
I0417 16:37:29.399453       1 eip_controlplane_reconciliation.go:249] healthcheck node https://139.178.94.167:6443/healthz
E0417 16:37:29.405800       1 eip_controlplane_reconciliation.go:135] failed to handle node health check: failed to assign the control plane endpoint: providerID cannot be empty string
I0417 16:37:29.405833       1 eip_controlplane_reconciliation.go:125] handling update, node: trusty-marmot
I0417 16:37:29.675037       1 eip_controlplane_reconciliation.go:529] doHealthCheck(): no control plane IP assignment found, trying to assign to an available controlplane node
I0417 16:37:29.683583       1 eip_controlplane_reconciliation.go:249] healthcheck node https://145.40.82.49:6443/healthz
E0417 16:37:29.689076       1 eip_controlplane_reconciliation.go:135] failed to handle node health check: failed to assign the control plane endpoint: providerID cannot be empty string

The text was updated successfully, but these errors were encountered:

cprivitere · 2024-04-17T16:44:27Z

You shouldn't be setting providerID, that's something CPEM sets for you. Why it's not setting it here though, that's the real question. Hmm.

We had this part working in the work we did before kubecon, do you still have access to that config? Probably something we had to disable on the talos side.

hh · 2024-04-17T16:46:33Z

It should be noted that it's also not clearing a taint I suspect it's responsible for:
#531

hh · 2024-04-17T16:47:25Z

I have another open issue related to the /healthz check: #519

hh · 2024-04-17T17:12:44Z

Lively conversation happing in #support channel on Talos / Sidero slack: https://taloscommunity.slack.com/archives/CMARMBC4E/p1712793108556169

Seems it might be related to the deviceByName function fallback wanting the kubernetes node names to match the Equinix devices names exactly.

Possibly? https://github.com/kubernetes-sigs/cloud-provider-equinix-metal/blob/main/metal/devices.go#L165-L167

hh · 2024-04-17T17:13:13Z

Going to try setting the machine.kubelet.registerWithFQDN: true in the Talos configuration.

This fixes kubernetes-sigs/cloud-provider-equinix-metal#533 // deviceByName returns an instance whose hostname matches the kubernetes node.Name Defined here : https://github.com/kubernetes-sigs/cloud-provider-equinix-metal/blob/main/metal/devices.go#L165C1-L166C1 The reason it fixes it is the logic in CPEM deviceByName requires the equinix metal device name match the kubernetes node name in order for eip_controlplane_reconciliation to complete.

hh · 2024-04-17T19:54:10Z

I found a work around, but it was a bit difficult to find.

sharingio/infra@96bff1f

I might be a one-off, but it might make sense to take some steps to raise visibility so others don't get stuck on this in the future:

the CPEM error message should clearly state reason match could not occur, possibly link to documentation
CPEM documentation should clearly state that kubernetes node names must match Equinix Metal device names
Talos documentation should probably state something similar in an updated integration page with Equinix

hh mentioned this issue Apr 17, 2024

CPEM should clear node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule ? #531

Closed

hh changed the title ~~CPEM fails to handle node health check : by failing to find providerID~~ CPEM requires Kubernetes node name to match Equinix Metal device name Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPEM requires Kubernetes node name to match Equinix Metal device name #533

CPEM requires Kubernetes node name to match Equinix Metal device name #533

hh commented Apr 17, 2024

cprivitere commented Apr 17, 2024 •

edited

hh commented Apr 17, 2024

hh commented Apr 17, 2024

hh commented Apr 17, 2024

hh commented Apr 17, 2024

hh commented Apr 17, 2024

CPEM requires Kubernetes node name to match Equinix Metal device name #533

CPEM requires Kubernetes node name to match Equinix Metal device name #533

Comments

hh commented Apr 17, 2024

CPEM daemonset

cloud-sa.json

CPEM logs

cprivitere commented Apr 17, 2024 • edited

hh commented Apr 17, 2024

hh commented Apr 17, 2024

hh commented Apr 17, 2024

hh commented Apr 17, 2024

hh commented Apr 17, 2024

cprivitere commented Apr 17, 2024 •

edited