Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade kubernetes version of EKSA cluster for bare metal with 2 CP nodes (1 used + 1 idle) doesn't work #7820

Open
ygao-armada opened this issue Mar 11, 2024 · 2 comments

Comments

@ygao-armada
Copy link

ygao-armada commented Mar 11, 2024

What happened:
It's ok for me to upgrade a cluster with 4 CP nodes (3 used + 1 idle).

However, when I try to upgrade a cluster with 2 CP nodes (1 used + 1 idle), the upgrade stuck after the idle node is "Provisioned":

armada@admin-machine2:~/eksa/mgmt02$ kubectl get workflow -A -o wide
NAMESPACE     NAME                                                            TEMPLATE                                                        STATE
eksa-system   mgmt02-standalone2-control-plane-template-1710122858425-44xpk   mgmt02-standalone2-control-plane-template-1710122858425-44xpk   STATE_SUCCESS

armada@admin-machine2:~/eksa/mgmt02$ kubectl get node -o wide
NAME              STATUS   ROLES           AGE     VERSION    INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
eksa-control-02   Ready    control-plane   5h16m   v1.26.14   10.20.22.224   <none>        Ubuntu 20.04.6 LTS   5.4.0-172-generic   containerd://1.7.10
eksa-wk-650-01    Ready    <none>          4h56m   v1.26.14   10.20.22.227   <none>        Ubuntu 20.04.6 LTS   5.4.0-172-generic   containerd://1.7.10

armada@admin-machine2:~/eksa/mgmt02$ kubectl get machines.cluster.x-k8s.io -A -o wide
NAMESPACE     NAME                                  CLUSTER              NODENAME          PROVIDERID                                      PHASE         AGE     VERSION
eksa-system   mgmt02-standalone2-b94wm              mgmt02-standalone2   eksa-control-02   tinkerbell://eksa-system/eksa-control-02        Running       4h54m   v1.26.10-eks-1-26-21
eksa-system   mgmt02-standalone2-md-0-dvxsx-ntvph   mgmt02-standalone2   eksa-wk-650-01    tinkerbell://eksa-system/eksa-wk-650-01         Running       4h54m   v1.26.10-eks-1-26-21
eksa-system   mgmt02-standalone2-p8fwl              mgmt02-standalone2                     tinkerbell://eksa-system/eksa-main-control-01   Provisioned   45m     v1.27.7-eks-1-27-15

An smoking issue after the idle node is "Provisioned" is that, when I run above "kubectl get ..." commands, I may see such error message:

...
E0311 07:45:31.895702  953673 memcache.go:265] couldn't get current server API group list: Get "https://10.20.22.222:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • EKS Anywhere Release: v0.18.2
  • EKS Distro Release: 1.26/1.27
@ndeksa
Copy link
Contributor

ndeksa commented Mar 20, 2024

@ygao-armada, ideally there shouldn't be a difference due to the number of CPs; am curious if the node is in idle stage due to no resource or some event, or something else ?

@ygao-armada
Copy link
Author

@ndeksa sorry, I should have used "spare" instead of "idle".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants