Skip to content

Commit

Permalink
Update trouble shooting to include the issue of etcd upgrade
Browse files Browse the repository at this point in the history
For the isse which is reported recently: kubernetes/kubeadm#2957
We'd better to provide some tips to workaround this known issue.
  • Loading branch information
chendave committed Nov 13, 2023
1 parent a83b56d commit 4140f6e
Showing 1 changed file with 99 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -431,3 +431,102 @@ See [Enabling signed kubelet serving certificates](/docs/tasks/administer-cluste
to understand how to configure the kubelets in a kubeadm cluster to have properly signed serving certificates.

Also see [How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely).

## Upgrade failed due to etcd hash is not changed

Only applicable for upgrade from v1.28.0 (v1.28.1, v1.28.2) to v1.28.3 or after.

Here is the snippet of error message which may look like,
```
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
I0907 10:10:09.109104 3704 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.6:2379/ https://172.17.0.4:2379/ https://172.17.0.3:2379/]) are available 1/10
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
cmd/kubeadm/app/phases/upgrade/staticpods.go:525
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
cmd/kubeadm/app/phases/upgrade/staticpods.go:254
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
cmd/kubeadm/app/phases/upgrade/staticpods.go:338
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
cmd/kubeadm/app/phases/upgrade/staticpods.go:465
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
cmd/kubeadm/app/phases/upgrade/staticpods.go:617
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
...
````

The reason for this is that the [patch 118867](https://github.com/kubernetes/kubernetes/pull/118867) accidently imported the package of `k8s.io/kubernetes/pkg/apis/core/v1`, and leads to the generation of default attributes of etcd pod for the kubeadm patch version where the change exists,
this will result to a diff for the manifest comparison, and kubeadm will expect a change of pod hash while kubelet will never update the hash.

The effected kubeadm version is v1.28.0, v1.28.1 and v1.28.2. Kubeadm has patched the v1.28.3, see [patch 120605](https://github.com/kubernetes/kubernetes/pull/120605), so the same issue shouldn't be seen for the upgrade between the version that is after v1.28.3.

There are two way to workaround this issue if you see the same problem in your cluster.
- skip the etcd upgrade
If you hit the same issue, etcd is not expected to be upgraded as there is no change except the default attribute which should be ignored instead, so you can skip it as this,
```bash
kubeadm upgrade apply [version] --etcd-upgrade=false
```

- patch the problematic etcd manifest to remove the default attributes

```patch
diff --git a/etc/kubernetes/manifests/etcd_defaults.yaml b/etc/kubernetes/manifests/etcd_origin.yaml
index d807ccbe0aa..46b35f00e15 100644
--- a/etc/kubernetes/manifests/etcd_defaults.yaml
+++ b/etc/kubernetes/manifests/etcd_origin.yaml
@@ -43,7 +43,6 @@ spec:
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
- successThreshold: 1
timeoutSeconds: 15
name: etcd
resources:
@@ -59,26 +58,18 @@ spec:
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
- successThreshold: 1
timeoutSeconds: 15
- terminationMessagePath: /dev/termination-log
- terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
- dnsPolicy: ClusterFirst
- enableServiceLinks: true
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
- restartPolicy: Always
- schedulerName: default-scheduler
securityContext:
seccompProfile:
type: RuntimeDefault
- terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
```

Relevant issue can be found from here [issue 2927](https://github.com/kubernetes/kubeadm/issues/2927), you can get more details from there.

0 comments on commit 4140f6e

Please sign in to comment.