operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded #3955

g-bohncke · 2024-11-26T14:25:39Z

Describe the bug
A concise description of what the bug is.

When running the latest version chart 1.10.1 app version : v2.10.1 we are encountering:
the following error.

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded.

this seems to be related to the change to AWS SDK Go v2 version and looks like the code ignores the vcpId and region from the helm chart. "Instead of depending on IMDSv2, you can specify the AWS Region and the VPC via the controller flags --aws-region and --aws-vpc-id." the SDK looks to be always pulling the metadata. cloud.go

Steps to reproduce
install the latest version on a private cluster.

Expected outcome
A concise description of what you expected to happen.
That the service works

Environment

AWS Load Balancer controller version
v2.10.1
Kubernetes version
1.29
Using EKS (yes/no), if so version?
Yes 1.29

Additional Context:

the latest policy has been applied and we use the policy via the node. (option B according to the docs).
we already verified that all the instances have a hop count of 2.

shraddhabang · 2024-11-27T21:12:59Z

Hey @g-bohncke , If you look here, we always infer the vpc-id and region from config first if its set before we infer it from ec2metadata. So it should have worked for you. Can we know which helm flags are using to set these values?

jsfrerot · 2025-01-07T21:16:35Z

Hi, I think I have the same issue and I suspect it's a configuration problem. However I can't find what it is. Maybe some guidance could help.

This is the error I see:
{"level":"error","ts":"2025-01-07T20:39:55Z","msg":"Reconciler error","controller":"service","namespace":"database","name":"yugabyted-ui-service","reconcileID":"4e4d44e6-7394-4fb3-9469-cc5085c13282","error":"operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded"}

Here is what I did.
This is a self installed k8s cluster in aws EC2 with rancher. It doesn't have public IP addresses, but to access internet nodes are behind a NAT.
currently using chart version 1.9.2 (but had also the issue with 1.11.0)
aws lbc is started with the following arguments:

Args:
--cluster-name=testjs2
--ingress-class=alb
--aws-region=us-east-1
--aws-vpc-id=vpc-<REDACTED>
--enable-shield=false
--enable-waf=false
--enable-wafv2=false

Shield, waf, and wafv2 disabled as per documented here: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/deploy/installation/#additional-requirements-for-isolated-cluster

I have enabled IMDSv2 and enabled hop limit to 2:

aws ec2 describe-instances --instance-id i-<REDACTED> --query 'Reservations[].Instances[].MetadataOptions'
[
    {
        "State": "applied",
        "HttpTokens": "required",
        "HttpPutResponseHopLimit": 2,
        "HttpEndpoint": "enabled",
        "HttpProtocolIpv6": "disabled",
        "InstanceMetadataTags": "disabled"
    }
]

I have attached policies to nodes, option B from the following doc: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/deploy/installation/#option-b-attach-iam-policies-to-nodes
The policies applied for the worker nodes are the following: https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json

I don't use TargetGroupBinding

I have enabled incoming TCP connection to port 9443 on worker nodes

aws ec2 describe-instances --instance-id i-<REDACTED> --query 'Reservations[].Instances[].SecurityGroups[]'
[
    {
        "GroupId": "sg-0e6cdc5a83cea1d18",
        "GroupName": "rancher-nodes"
    }
]

aws ec2 describe-security-groups --group-ids sg-0e6cdc5a83cea1d18
...
        {
          "IpProtocol": "tcp",
          "FromPort": 9443,
          "ToPort": 9443,
          "UserIdGroupPairs": [
            {
              "UserId": "503561456987",
              "GroupId": "sg-0e6cdc5a83cea1d18"
            }
          ],
          "IpRanges": [],
          "Ipv6Ranges": [],
          "PrefixListIds": []
        },
 ...

What am I missing?

jsfrerot · 2025-01-09T14:11:41Z

So, I fix my issue.
TLDR: set HttpPutResponseHopLimit to 3

I found this documentation that explains how to access metadata from and ec2 host.
Then I opened a shell in one of my pods and ran

curl -v http://169.254.169.254/latest/meta-data/
and got, of course, a 401 Unauthorized

you have to get a token to do the request:

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/

for some reason, I could get a 401 when accessing http://169.254.169.254/latest/meta-data/
but a timeout when trying to get a token! http://169.254.169.254/latest/api/token

Changed the HttpPutResponseHopLimit to 3
aws ec2 modify-instance-metadata-options --instance-id i-1234567898abcdef0 --http-put-response-hop-limit 3

then I could get a token from http://169.254.169.254/latest/api/token

Hope this will help other folks to waste less time on this!

gxpd-jjh · 2025-01-16T00:01:12Z

This helped orient me^^ thank you.

Some findings:

The default EKS Node group now sets IMDBv2 to Required and Http-Put-Repsonse-Hop-Limit to 1 (if you don't specify a launch template). You will need a custom Launch Template if you want to continue down this IMDBv2 path, or alter the Launch Template that gets created after Node Group gets created. The values are buried in Advance section.
What is also interesting is the default Launch Template is v1/v2 Optional .. so there is a mismatch between EKS Node Group UX and that.
If you want to skip it and instead rely on the flags and use Helm, the parameters are region and vpcId.

shraddhabang added the triage/needs-investigation label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded #3955

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded #3955

g-bohncke commented Nov 26, 2024

shraddhabang commented Nov 27, 2024

jsfrerot commented Jan 7, 2025 •

edited

Loading

jsfrerot commented Jan 9, 2025

gxpd-jjh commented Jan 16, 2025 •

edited

Loading

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded #3955

operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded #3955

Comments

g-bohncke commented Nov 26, 2024

shraddhabang commented Nov 27, 2024

jsfrerot commented Jan 7, 2025 • edited Loading

jsfrerot commented Jan 9, 2025

gxpd-jjh commented Jan 16, 2025 • edited Loading

jsfrerot commented Jan 7, 2025 •

edited

Loading

gxpd-jjh commented Jan 16, 2025 •

edited

Loading