kubernetes · chendave · Nov 13, 2023
diff --git a/...t/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md b/...t/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md
@@ -431,3 +431,102 @@ See [Enabling signed kubelet serving certificates](/docs/tasks/administer-cluste
 to understand how to configure the kubelets in a kubeadm cluster to have properly signed serving certificates.
 
 Also see [How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely).
+
+## Upgrade failed due to etcd hash is not changed
+
+Only applicable for upgrade from v1.28.0 (v1.28.1, v1.28.2) to v1.28.3 or after.
+
+Here is the snippet of error message which may look like,
+```
+[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
+[upgrade/etcd] Waiting for previous etcd to become available
+I0907 10:10:09.109104 3704 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.6:2379/ https://172.17.0.4:2379/ https://172.17.0.3:2379/]) are available 1/10
+[upgrade/etcd] Etcd was rolled back and is now available
+static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
+couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
+k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
+ cmd/kubeadm/app/phases/upgrade/staticpods.go:525
+k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
+ cmd/kubeadm/app/phases/upgrade/staticpods.go:254
+k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
+ cmd/kubeadm/app/phases/upgrade/staticpods.go:338
+k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
+ cmd/kubeadm/app/phases/upgrade/staticpods.go:465
+k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
+ cmd/kubeadm/app/phases/upgrade/staticpods.go:617
+k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
+ cmd/kubeadm/app/cmd/upgrade/apply.go:216
+k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
+ cmd/kubeadm/app/cmd/upgrade/apply.go:156
+k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
+ cmd/kubeadm/app/cmd/upgrade/apply.go:74
+github.com/spf13/cobra.(*Command).execute
+ vendor/github.com/spf13/cobra/command.go:940
+github.com/spf13/cobra.(*Command).ExecuteC
+ vendor/github.com/spf13/cobra/command.go:1068
+github.com/spf13/cobra.(*Command).Execute
+ vendor/github.com/spf13/cobra/command.go:992
+k8s.io/kubernetes/cmd/kubeadm/app.Run
+ cmd/kubeadm/app/kubeadm.go:50
+main.main
+ cmd/kubeadm/kubeadm.go:25
+...
+````
+
+The reason for this is that the [patch 118867](https://github.com/kubernetes/kubernetes/pull/118867) accidently imported the package of `k8s.io/kubernetes/pkg/apis/core/v1`, and leads to the generation of default attributes of etcd pod for the kubeadm patch version where the change exists,
+this will result to a diff for the manifest comparison, and kubeadm will expect a change of pod hash while kubelet will never update the hash.
+
+The effected kubeadm version is v1.28.0, v1.28.1 and v1.28.2. Kubeadm has patched the v1.28.3, see [patch 120605](https://github.com/kubernetes/kubernetes/pull/120605), so the same issue shouldn't be seen for the upgrade between the version that is after v1.28.3.
+
+There are two way to workaround this issue if you see the same problem in your cluster.
+- skip the etcd upgrade
+If you hit the same issue, etcd is not expected to be upgraded as there is no change except the default attribute which should be ignored instead, so you can skip it as this, 
+```bash
+kubeadm upgrade apply [version] --etcd-upgrade=false
+```
+
+- patch the problematic etcd manifest to remove the default attributes
+
+```patch
+diff --git a/etc/kubernetes/manifests/etcd_defaults.yaml b/etc/kubernetes/manifests/etcd_origin.yaml
+index d807ccbe0aa..46b35f00e15 100644
+--- a/etc/kubernetes/manifests/etcd_defaults.yaml
++++ b/etc/kubernetes/manifests/etcd_origin.yaml
+@@ -43,7 +43,6 @@ spec:
+ scheme: HTTP
+ initialDelaySeconds: 10
+ periodSeconds: 10
+- successThreshold: 1
+ timeoutSeconds: 15
+ name: etcd
+ resources:
+@@ -59,26 +58,18 @@ spec:
+ scheme: HTTP
+ initialDelaySeconds: 10
+ periodSeconds: 10
+- successThreshold: 1
+ timeoutSeconds: 15
+- terminationMessagePath: /dev/termination-log
+- terminationMessagePolicy: File
+ volumeMounts:
+ - mountPath: /var/lib/etcd
+ name: etcd-data
+ - mountPath: /etc/kubernetes/pki/etcd
+ name: etcd-certs
+- dnsPolicy: ClusterFirst
+- enableServiceLinks: true
+ hostNetwork: true
+ priority: 2000001000
+ priorityClassName: system-node-critical
+- restartPolicy: Always
+- schedulerName: default-scheduler
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
+- terminationGracePeriodSeconds: 30
+ volumes:
+ - hostPath:
+ path: /etc/kubernetes/pki/etcd
+```
+
+Relevant issue can be found from here [issue 2927](https://github.com/kubernetes/kubeadm/issues/2927), you can get more details from there.