Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cluster Deletion fails with "Error: deadline surpassed waiting for AWS load balancers to be deleted" #7548

Open
fbuchmeier-abi opened this issue Feb 13, 2024 · 1 comment

Comments

@fbuchmeier-abi
Copy link

fbuchmeier-abi commented Feb 13, 2024

What were you trying to accomplish?

I'm trying to delete a eksctl managed cluster that contains AWS Application Loadbalancers managed by the aws-lb-controller (https://kubernetes-sigs.github.io/aws-load-balancer-controller).

What happened?

Cluster deletion times out with the error below:

"cmd": [
        "eksctl",
        "delete",
        "cluster",
        "--region",
        "eu-central-1",
        "--name",
        "sandbox",
        "--wait"
    ],
}

STDOUT:

2024-02-09 20:02:45 [ℹ]  deleting EKS cluster "sandbox"
2024-02-09 20:02:46 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "sandbox"
2024-02-09 20:02:46 [ℹ]  starting parallel draining, max in-flight of 1
2024-02-09 20:02:46 [ℹ]  deleted 0 Fargate profile(s)
2024-02-09 20:02:47 [✔]  kubeconfig has been updated
2024-02-09 20:02:47 [ℹ]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress


STDERR:

Error: deadline surpassed waiting for AWS load balancers to be deleted: k8s-sharedtools-5732128751

How to reproduce it?

  1. Deploy a new EKS cluster (I used 1.28) with eksctl >= 0.144.0 and the vpc-cni addon

  2. Provision the aws-lb-controller as described in the docs: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/deploy/installation/

  3. Set up an ingress referencing an Application Loadbalancer. In my case, I am using annotations on the Ingress object:

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
    annotations:
        kubernetes.io/ingress.class: alb
  4. wait until the loadbalancer has been successfully created

  5. Delete the EKS cluster

    eksctl delete cluster --region eu-central-1 --name sandbox --wait

Anything else we need to know?

According to my research, the problem occurs because the AWS VPC CNI (aws-node daemonset) is deleted prior to the deletion of associated Kubernetes services and ingress objects. Deleting the CNI daemonset means that the aws-lb-controller pods fail to process the finalizers for these objects. The objects then get stuck and can not be deleted in Kubernetes.

For me the cluster deletion process is like follows:

  1. VPC CNI gets deleted: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/owned.go#L95
  2. Shared resources get deleted: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/owned.go#L105
  3. Shared resources include AWS LB: https://github.com/aaroniscode/eksctl/blob/main/pkg/actions/cluster/delete.go#L63
  4. AWS LB now (since PR: Clean up ALBs using spec.ingressClassName and ALB security groups #6389) include deletion of AWS LB Controller managed resources: https://github.com/aaroniscode/eksctl/blob/08bd92c91037ca21ec18c04277d9d6ba4d21d704/pkg/elb/cleanup.go#L96C2-L96C18

This issue is happening for me since the upgrade to >= 0.144: https://github.com/eksctl-io/eksctl/releases/tag/v0.144.0 and was probably introduced with: #6389

Versions

eksctl info
eksctl version: 0.169.0
kubectl version: v1.24.10
OS: linux

Best regards,
Florian.

Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants