[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted" #7548

fbuchmeier-abi · 2024-02-13T14:05:56Z

What were you trying to accomplish?

I'm trying to delete a eksctl managed cluster that contains AWS Application Loadbalancers managed by the aws-lb-controller (https://kubernetes-sigs.github.io/aws-load-balancer-controller).

What happened?

Cluster deletion times out with the error below:

"cmd": [
        "eksctl",
        "delete",
        "cluster",
        "--region",
        "eu-central-1",
        "--name",
        "sandbox",
        "--wait"
    ],
}

STDOUT:

2024-02-09 20:02:45 [ℹ]  deleting EKS cluster "sandbox"
2024-02-09 20:02:46 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "sandbox"
2024-02-09 20:02:46 [ℹ]  starting parallel draining, max in-flight of 1
2024-02-09 20:02:46 [ℹ]  deleted 0 Fargate profile(s)
2024-02-09 20:02:47 [✔]  kubeconfig has been updated
2024-02-09 20:02:47 [ℹ]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress


STDERR:

Error: deadline surpassed waiting for AWS load balancers to be deleted: k8s-sharedtools-5732128751

How to reproduce it?

Deploy a new EKS cluster (I used 1.28) with eksctl >= 0.144.0 and the vpc-cni addon
Provision the aws-lb-controller as described in the docs: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/deploy/installation/

Set up an ingress referencing an Application Loadbalancer. In my case, I am using annotations on the Ingress object:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
    kubernetes.io/ingress.class: alb

wait until the loadbalancer has been successfully created

Delete the EKS cluster

eksctl delete cluster --region eu-central-1 --name sandbox --wait

Anything else we need to know?

According to my research, the problem occurs because the AWS VPC CNI (aws-node daemonset) is deleted prior to the deletion of associated Kubernetes services and ingress objects. Deleting the CNI daemonset means that the aws-lb-controller pods fail to process the finalizers for these objects. The objects then get stuck and can not be deleted in Kubernetes.

For me the cluster deletion process is like follows:

VPC CNI gets deleted: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/owned.go#L95
Shared resources get deleted: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/owned.go#L105
Shared resources include AWS LB: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/delete.go#L63
AWS LB now (since PR: Clean up ALBs using spec.ingressClassName and ALB security groups #6389) include deletion of AWS LB Controller managed resources: https://github.com/aaroniscode/eksctl/blob/08bd92c91037ca21ec18c04277d9d6ba4d21d704/pkg/elb/cleanup.go#L96C2-L96C18

This issue is happening for me since the upgrade to >= 0.144: https://github.com/eksctl-io/eksctl/releases/tag/v0.144.0 and was probably introduced with: #6389

Versions

eksctl info
eksctl version: 0.169.0
kubectl version: v1.24.10
OS: linux

Best regards,
Florian.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-03-15T01:46:11Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

fbuchmeier-abi added the kind/bug label Feb 13, 2024

github-actions bot added the stale label Mar 15, 2024

yuxiang-zhang added the needs-investigation label Mar 15, 2024

TiberiuGC removed the stale label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted" #7548

[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted" #7548

fbuchmeier-abi commented Feb 13, 2024 •

edited

github-actions bot commented Mar 15, 2024

[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted" #7548

[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted" #7548

Comments

fbuchmeier-abi commented Feb 13, 2024 • edited

What were you trying to accomplish?

What happened?

How to reproduce it?

github-actions bot commented Mar 15, 2024

fbuchmeier-abi commented Feb 13, 2024 •

edited