Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded #3955

Open
g-bohncke opened this issue Nov 26, 2024 · 4 comments

Comments

@g-bohncke
Copy link

Describe the bug
A concise description of what the bug is.

When running the latest version chart 1.10.1 app version : v2.10.1 we are encountering:
the following error.

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded.

this seems to be related to the change to AWS SDK Go v2 version and looks like the code ignores the vcpId and region from the helm chart. "Instead of depending on IMDSv2, you can specify the AWS Region and the VPC via the controller flags --aws-region and --aws-vpc-id." the SDK looks to be always pulling the metadata. cloud.go

Steps to reproduce
install the latest version on a private cluster.

Expected outcome
A concise description of what you expected to happen.
That the service works

Environment

  • AWS Load Balancer controller version
    v2.10.1
  • Kubernetes version
    1.29
  • Using EKS (yes/no), if so version?
    Yes 1.29

Additional Context:

  • the latest policy has been applied and we use the policy via the node. (option B according to the docs).
  • we already verified that all the instances have a hop count of 2.
@shraddhabang
Copy link
Collaborator

Hey @g-bohncke , If you look here, we always infer the vpc-id and region from config first if its set before we infer it from ec2metadata. So it should have worked for you. Can we know which helm flags are using to set these values?

@jsfrerot
Copy link

jsfrerot commented Jan 7, 2025

Hi, I think I have the same issue and I suspect it's a configuration problem. However I can't find what it is. Maybe some guidance could help.

This is the error I see:
{"level":"error","ts":"2025-01-07T20:39:55Z","msg":"Reconciler error","controller":"service","namespace":"database","name":"yugabyted-ui-service","reconcileID":"4e4d44e6-7394-4fb3-9469-cc5085c13282","error":"operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded"}

Here is what I did.
This is a self installed k8s cluster in aws EC2 with rancher. It doesn't have public IP addresses, but to access internet nodes are behind a NAT.
currently using chart version 1.9.2 (but had also the issue with 1.11.0)
aws lbc is started with the following arguments:

Args:
--cluster-name=testjs2
--ingress-class=alb
--aws-region=us-east-1
--aws-vpc-id=vpc-<REDACTED>
--enable-shield=false
--enable-waf=false
--enable-wafv2=false

Shield, waf, and wafv2 disabled as per documented here: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/deploy/installation/#additional-requirements-for-isolated-cluster

I have enabled IMDSv2 and enabled hop limit to 2:

aws ec2 describe-instances --instance-id i-<REDACTED> --query 'Reservations[].Instances[].MetadataOptions'
[
    {
        "State": "applied",
        "HttpTokens": "required",
        "HttpPutResponseHopLimit": 2,
        "HttpEndpoint": "enabled",
        "HttpProtocolIpv6": "disabled",
        "InstanceMetadataTags": "disabled"
    }
]

I have attached policies to nodes, option B from the following doc: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/deploy/installation/#option-b-attach-iam-policies-to-nodes
The policies applied for the worker nodes are the following: https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json

I don't use TargetGroupBinding

I have enabled incoming TCP connection to port 9443 on worker nodes

aws ec2 describe-instances --instance-id i-<REDACTED> --query 'Reservations[].Instances[].SecurityGroups[]'
[
    {
        "GroupId": "sg-0e6cdc5a83cea1d18",
        "GroupName": "rancher-nodes"
    }
]

aws ec2 describe-security-groups --group-ids sg-0e6cdc5a83cea1d18
...
        {
          "IpProtocol": "tcp",
          "FromPort": 9443,
          "ToPort": 9443,
          "UserIdGroupPairs": [
            {
              "UserId": "503561456987",
              "GroupId": "sg-0e6cdc5a83cea1d18"
            }
          ],
          "IpRanges": [],
          "Ipv6Ranges": [],
          "PrefixListIds": []
        },
 ...

What am I missing?

@jsfrerot
Copy link

jsfrerot commented Jan 9, 2025

So, I fix my issue.
TLDR: set HttpPutResponseHopLimit to 3

I found this documentation that explains how to access metadata from and ec2 host.
Then I opened a shell in one of my pods and ran

curl -v http://169.254.169.254/latest/meta-data/
and got, of course, a 401 Unauthorized

you have to get a token to do the request:

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/

for some reason, I could get a 401 when accessing http://169.254.169.254/latest/meta-data/
but a timeout when trying to get a token! http://169.254.169.254/latest/api/token

Changed the HttpPutResponseHopLimit to 3
aws ec2 modify-instance-metadata-options --instance-id i-1234567898abcdef0 --http-put-response-hop-limit 3

then I could get a token from http://169.254.169.254/latest/api/token

Hope this will help other folks to waste less time on this!

@gxpd-jjh
Copy link

gxpd-jjh commented Jan 16, 2025

This helped orient me^^ thank you.

Some findings:

  1. The default EKS Node group now sets IMDBv2 to Required and Http-Put-Repsonse-Hop-Limit to 1 (if you don't specify a launch template). You will need a custom Launch Template if you want to continue down this IMDBv2 path, or alter the Launch Template that gets created after Node Group gets created. The values are buried in Advance section.
    What is also interesting is the default Launch Template is v1/v2 Optional .. so there is a mismatch between EKS Node Group UX and that.

  2. If you want to skip it and instead rely on the flags and use Helm, the parameters are region and vpcId.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants