kubeadm upgrade from v1.28.0 to v1.28.3 fails #2957

alexarefev · 2023-11-07T12:48:26Z

What happened?

The following command

kubeadm upgrade apply v1.28.3 -f --certificate-renewal=true --ignore-preflight-errors='CoreDNSUnsupportedPlugins,Port-6443' --patches=/etc/kubernetes/patches

fails with the error:

[upgrade/apply] FATAL: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node qa-fullha-master1 did not change after 5m0s: timed out waiting for the condition

if the kubeadm v1.28.3

The kubeadm v1.28.0 upgrades cluster successfully

What did you expect to happen?

The kubeadm v1.28.3 upgrades cluster successfully

How can we reproduce it (as minimally and precisely as possible)?

Download kubeadm v1.28.3 and run upgrade of Kubernetes v1.28.0

Anything else we need to know?

The issue might be fixed by the --etcd-upgrade flag.

Kubernetes version

$ kubectl version
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3

Cloud provider

Not applicable

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux ubuntu 5.15.0-43-generic kubernetes/kubernetes#46-Ubuntu SMP Tue Jul 12 10:30:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2023-11-07T12:48:34Z

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-11-07T12:48:36Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2023-11-07T16:12:52Z

/transfer kubeadm

neolit123 · 2023-11-07T16:22:38Z

can you share full logs here or in a github Gist maybe?
from what version are you upgrading from?

The issue might be fixed by the --etcd-upgrade flag.

if the flag is set to false, you mean?

The kubeadm v1.28.0 upgrades cluster successfully

we have our 1.27.latest -> 1.28.latest upgrade tests working fine:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-27-1-28

but etcd is not upgraded, because there is the same version between 1.27 and 1.28:

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-27-1-28/1721862425331896320/build-log.txt

[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.28.3-22+197e7579adb1bf" (timeout: 5m0s)...
I1107 12:18:53.445360    4203 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 4 milliseconds
I1107 12:18:53.450351    4203 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 3 milliseconds
I1107 12:18:53.455022    4203 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 3 milliseconds
I1107 12:18:53.455712    4203 etcd.go:214] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I1107 12:18:53.461659    4203 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane 200 OK in 5 milliseconds
I1107 12:18:53.463060    4203 etcd.go:150] etcd endpoints read from pods: https://172.17.0.2:2379,https://172.17.0.6:2379,https://172.17.0.3:2379
I1107 12:18:53.477995    4203 etcd.go:262] etcd endpoints read from etcd: https://172.17.0.2:2379,https://172.17.0.3:2379,https://172.17.0.6:2379
I1107 12:18:53.478146    4203 etcd.go:168] update etcd endpoints: https://172.17.0.2:2379,https://172.17.0.3:2379,https://172.17.0.6:2379
[upgrade/etcd] Upgrading to TLS for etcd
I1107 12:18:53.809153    4203 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 4 milliseconds
I1107 12:18:53.812731    4203 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3206296204/etcd.yaml"
[upgrade/staticpods] Preparing for "etcd" upgrade
I1107 12:18:53.815034    4203 etcd.go:214] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
[upgrade/staticpods] Current and new manifests of etcd are equal, skipping upgrade
I1107 12:18:53.821659    4203 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane 200 OK in 6 milliseconds
I1107 12:18:53.823184    4203 etcd.go:150] etcd endpoints read from pods: https://172.17.0.2:2379,https://172.17.0.6:2379,https://172.17.0.3:2379
I1107 12:18:53.838740    4203 etcd.go:262] etcd endpoints read from etcd: https://172.17.0.2:2379,https://172.17.0.3:2379,https://172.17.0.6:2379
I1107 12:18:53.838771    4203 etcd.go:168] update etcd endpoints: https://172.17.0.2:2379,https://172.17.0.3:2379,https://172.17.0.6:2379
I1107 12:18:53.838785    4203 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.2:2379 https://172.17.0.3:2379 https://172.17.0.6:2379]) are available 1/10
[upgrade/etcd] Waiting for etcd to become available

our 1.28.latest -> 1.29.latest upgrade works as well, where actual etcd upgrade happens:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1721875511665233920/build-log.txt

I1107 13:11:26.312401    3826 etcd.go:264] etcd endpoints read from etcd: https://172.17.0.5:2379,https://172.17.0.3:2379,https://172.17.0.6:2379
I1107 13:11:26.312428    3826 etcd.go:170] update etcd endpoints: https://172.17.0.5:2379,https://172.17.0.3:2379,https://172.17.0.6:2379
I1107 13:11:26.565664    3826 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 3 milliseconds
[upgrade/staticpods] Preparing for "etcd" upgrade
I1107 13:11:26.567768    3826 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3178925015/etcd.yaml"
[upgrade/staticpods] Renewing etcd-server certificate
I1107 13:11:26.571109    3826 staticpods.go:225] Pod manifest files diff:
@@ -35 +35 @@
-    image: registry.k8s.io/etcd:3.5.9-0
+    image: registry.k8s.io/etcd:3.5.10-0

alexarefev · 2023-11-08T07:16:31Z

Hello @neolit123
Thank you for quick response
The case is about patch version upgrade v1.28.0 -> v1.28.3 only

The issue might be fixed by the --etcd-upgrade flag.

if the flag is set to false, you mean?

You are absolutely right

Here is a log

kubeadm upgrade apply v1.28.3 -f --certificate-renewal=true --ignore-preflight-errors='Port-6443,CoreDNSUnsupportedPlugins' --patches=/etc/kubernetes/patches --v=5
I1108 07:03:22.067156   28755 apply.go:106] [upgrade/apply] verifying health of cluster
I1108 07:03:22.067237   28755 apply.go:107] [upgrade/apply] retrieving configuration from cluster
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
I1108 07:03:22.093099   28755 kubelet.go:74] attempting to download the KubeletConfiguration from ConfigMap "kubelet-config"
W1108 07:03:22.114754   28755 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
I1108 07:03:22.120619   28755 common.go:186] running preflight checks
[preflight] Running pre-flight checks.
I1108 07:03:22.121543   28755 preflight.go:77] validating if there are any unsupported CoreDNS plugins in the Corefile
I1108 07:03:22.132116   28755 preflight.go:109] validating if migration can be done for the current CoreDNS release.
[upgrade] Running cluster health checks
I1108 07:03:22.147836   28755 health.go:157] Creating Job "upgrade-health-check" in the namespace "kube-system"
I1108 07:03:22.173263   28755 health.go:187] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I1108 07:03:23.178141   28755 health.go:187] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I1108 07:03:24.176902   28755 health.go:187] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I1108 07:03:25.177266   28755 health.go:187] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I1108 07:03:26.180729   28755 health.go:194] Job "upgrade-health-check" in the namespace "kube-system" completed
I1108 07:03:26.180758   28755 health.go:200] Deleting Job "upgrade-health-check" in the namespace "kube-system"
I1108 07:03:26.202437   28755 apply.go:114] [upgrade/apply] validating requested and actual version
I1108 07:03:26.202508   28755 apply.go:130] [upgrade/version] enforcing version skew policies
[upgrade/version] You have chosen to change the cluster version to "v1.28.3"
[upgrade/versions] Cluster version: v1.28.0
[upgrade/versions] kubeadm version: v1.28.3
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
I1108 07:03:26.213743   28755 checks.go:828] using image pull policy: IfNotPresent
I1108 07:03:26.266829   28755 checks.go:846] image exists: registry.k8s.io/kube-apiserver:v1.28.3
I1108 07:03:26.296931   28755 checks.go:846] image exists: registry.k8s.io/kube-controller-manager:v1.28.3
I1108 07:03:26.330171   28755 checks.go:846] image exists: registry.k8s.io/kube-scheduler:v1.28.3
I1108 07:03:26.369544   28755 checks.go:846] image exists: registry.k8s.io/kube-proxy:v1.28.3
I1108 07:03:26.428297   28755 checks.go:846] image exists: registry.k8s.io/pause:3.9
I1108 07:03:26.454074   28755 checks.go:846] image exists: registry.k8s.io/etcd:3.5.9-0
I1108 07:03:26.490969   28755 checks.go:846] image exists: registry.k8s.io/coredns/coredns:v1.10.1
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.28.3" (timeout: 5m0s)...
I1108 07:03:26.504648   28755 etcd.go:214] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I1108 07:03:26.509765   28755 etcd.go:150] etcd endpoints read from pods: https://192.168.56.106:2379
I1108 07:03:26.524911   28755 etcd.go:262] etcd endpoints read from etcd: https://192.168.56.106:2379
I1108 07:03:26.524939   28755 etcd.go:168] update etcd endpoints: https://192.168.56.106:2379
[upgrade/etcd] Upgrading to TLS for etcd
[patches] Reading patches from path "/etc/kubernetes/patches"
[patches] Found the following patch files: [kube-apiserver.json]
I1108 07:03:26.741147   28755 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3034292124/etcd.yaml"
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Renewing etcd-server certificate
I1108 07:03:26.744394   28755 certs.go:519] validating certificate period for etcd CA certificate
I1108 07:03:26.744918   28755 certs.go:519] validating certificate period for etcd/ca certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2023-11-08-07-03-26/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node ubuntu did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
I1108 07:08:27.617266   28755 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://192.168.56.106:2379]) are available 1/10
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node ubuntu did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
        cmd/kubeadm/app/phases/upgrade/staticpods.go:525
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
        cmd/kubeadm/app/phases/upgrade/staticpods.go:254
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
        cmd/kubeadm/app/phases/upgrade/staticpods.go:338
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
        cmd/kubeadm/app/phases/upgrade/staticpods.go:465
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
        cmd/kubeadm/app/phases/upgrade/staticpods.go:617
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
        cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
        cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
        cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598
fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
        cmd/kubeadm/app/phases/upgrade/staticpods.go:367
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
        cmd/kubeadm/app/phases/upgrade/staticpods.go:465
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
        cmd/kubeadm/app/phases/upgrade/staticpods.go:617
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
        cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
        cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
        cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598
[upgrade/apply] FATAL
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
        cmd/kubeadm/app/cmd/upgrade/apply.go:157
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
        cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598

Thank you in advance

neolit123 · 2023-11-08T09:01:05Z

thanks for the logs, i will try to reproduce this locally.
the workaround like you mentioned is to just skip the etcd upgrade. between 1.28.0 and 1.28.3 there is nothing to upgrade in etcd.

neolit123 · 2023-11-08T09:32:44Z

The case is about patch version upgrade v1.28.0 -> v1.28.3 only

i was unable to reproduce the bug.

here are my steps:

build kubeadm v1.28.0 from source
create a v1.28.0 single node cluster with kubeadm init ...
build v1.28.3 from source
call kubeadm upgrade apply v1.28.3 ...

relevant etcd logs from upgrade:

108 11:22:31.029224   23021 etcd.go:214] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I1108 11:22:31.033943   23021 etcd.go:150] etcd endpoints read from pods: https://10.0.2.15:2379
I1108 11:22:31.046855   23021 etcd.go:262] etcd endpoints read from etcd: https://10.0.2.15:2379
I1108 11:22:31.046955   23021 etcd.go:168] update etcd endpoints: https://10.0.2.15:2379
[upgrade/etcd] Upgrading to TLS for etcd
I1108 11:22:31.237590   23021 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests938074122/etcd.yaml"
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Current and new manifests of etcd are equal, skipping upgrade
I1108 11:22:31.242635   23021 etcd.go:214] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I1108 11:22:31.255296   23021 etcd.go:150] etcd endpoints read from pods: https://10.0.2.15:2379
I1108 11:22:31.267317   23021 etcd.go:262] etcd endpoints read from etcd: https://10.0.2.15:2379
I1108 11:22:31.267808   23021 etcd.go:168] update etcd endpoints: https://10.0.2.15:2379
[upgrade/etcd] Waiting for etcd to become available
I1108 11:22:31.267818   23021 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://10.0.2.15:2379]) are available 1/10

notice that etcd is not upgraded in my case.

Current and new manifests of etcd are equal, skipping upgrade

this is expected as the generated manifest between .0 and .3 are the same.
in your case etcd upgrade is actually performed, but then fails because the pod hash does not change.

can you share your kubeadm cluster configuration?
hide any IP / DNS names if needed:
kubectl get cm -n kube-system kubeadm-config

also if you are passing --config to init, share that as well please.

neolit123 · 2023-11-08T09:37:02Z

i was unable to reproduce the bug.

but, we did have a major bug that was related to the etcd hash comparison, not so long ago...so it might be best if more people look at this.
@chendave @SataQiu @pacoxu

#2927 (comment)

@chendave that was due to the mistaken import of internal defaulters.
hope we didn't create a huge mess for users upgrading from 1.28.patch to another 1.28.patch.

pacoxu · 2023-11-08T09:47:22Z

IIRC, the bug will be triggered when the user's cluster was already upgraded to v1.28.0-1 and then needs to be upgraded to v1.28.2+ or v1.29+.

For etcd, there is a new version v3.5.10. If the v1.28.x(next patch release) include that, it would be fixed in a tricky way if I understand correctly.

chendave · 2023-11-08T09:58:42Z

hope we didn't create a huge mess for users upgrading from 1.28.patch to another 1.28.patch.

I think there is nothing we do if the cluster is on 1.28.patch which has the change (kubernetes/kubernetes#118867) already, skip the etcd upgrade is the best choice here.

neolit123 · 2023-11-08T10:00:05Z

IIRC, the bug will be triggered when the user's cluster was already upgraded to v1.28.0-1 and then needs to be upgraded to v1.28.2+ or v1.29+.

For etcd, there is a new version v3.5.10. If the v1.28.x(next patch release) include that, it would be fixed in a tricky way if I understand correctly.

we patched it for 1.29 here:
kubernetes/kubernetes#120561

then we backported it for 1.28 here:
kubernetes/kubernetes@0c6a0c3

that was on 14 of Sept and it should be part of 1.28.3, but not in 1.28.2 if i'm reading the history of the branch 1.28 correctly:
https://github.com/kubernetes/kubernetes/commits/release-1.28?before=197e7579adb1bf180617bd3becc2aa4dcceb5291+35&branch=release-1.28&qualified_name=refs%2Fheads%2Frelease-1.28

so in theory there should be no problem for the .3 upgrade.
but if .4 includes an actual etcd upgrade then there is no way for the hash issue to surface.

neolit123 · 2023-11-08T10:06:34Z

hope we didn't create a huge mess for users upgrading from 1.28.patch to another 1.28.patch.

I think there is nothing we do if the cluster is on 1.28.patch which has the change (kubernetes/kubernetes#118867) already, skip the etcd upgrade is the best choice here.

it sounds like a valid workaround, but it becomes a problem when we have to direct many users to a single ticket (this one).

it's strange because it should not happen, and i confirmed it locally with .3.
yet, it could be somehow related to custom config options. that's why i asked @alexarefev to provide the config here

chendave · 2023-11-08T10:15:25Z

the ticket here is from "v1.28.0 to v1.28.3", the fix there kubernetes/kubernetes#120561 should be applied to both the initial and dest version.

even v1.28.3 has the cherry-pick included, v1.28.0 still has the problematic code, so it tends to fail.

chendave · 2023-11-08T10:22:55Z

@alexarefev another workaround is patch your etcd.yaml and remove all the defaults there, see:#2927 (comment) before upgrade.

alexarefev · 2023-11-08T14:42:58Z

can you share your kubeadm cluster configuration?
hide any IP / DNS names if needed:
kubectl get cm -n kube-system kubeadm-config

also if you are passing --config to init, share that as well please.

Hi @neolit123
Here it is.
kubeadm-config:

apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      certSANs:
      - ubuntu
      - 192.168.56.106
      extraArgs:
        audit-log-maxage: "30"
        audit-log-maxbackup: "10"
        audit-log-maxsize: "100"
        audit-log-path: /var/log/kubernetes/audit/audit.log
        audit-policy-file: /etc/kubernetes/audit-policy.yaml
        authorization-mode: Node,RBAC
        enable-admission-plugins: NodeRestriction
        profiling: "false"
      extraVolumes:
      - hostPath: /etc/kubernetes/audit-policy.yaml
        mountPath: /etc/kubernetes/audit-policy.yaml
        name: audit
        pathType: File
        readOnly: true
      - hostPath: /var/log/kubernetes/audit/
        mountPath: /var/log/kubernetes/audit/
        name: audit-log
        pathType: DirectoryOrCreate
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta3
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controlPlaneEndpoint: all.new.local:6443
    controllerManager:
      extraArgs:
        profiling: "false"
        terminated-pod-gc-threshold: "1000"
    dns:
      imageRepository: registry.k8s.io/coredns
    etcd:
      local:
        dataDir: /var/lib/etcd
    imageRepository: registry.k8s.io
    kind: ClusterConfiguration
    kubernetesVersion: v1.28.0
    networking:
      dnsDomain: cluster.local
      podSubnet: 10.128.0.0/14
      serviceSubnet: 172.30.0.0/16
    scheduler:
      extraArgs:
        profiling: "false"
kind: ConfigMap
metadata:
  creationTimestamp: "2023-11-08T13:26:45Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "193"
  uid: 4ad93321-fad0-4021-97e9-d1501511d781

init-config:

apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
enableDebuggingHandlers: true
kind: KubeletConfiguration
podPidsLimit: 4096
protectKernelDefaults: true
readOnlyPort: 0
serializeImagePulls: false
---
apiServer:
  certSANs:
  - ubuntu
  - 192.168.56.106
  extraArgs:
    audit-log-maxage: '30'
    audit-log-maxbackup: '10'
    audit-log-maxsize: '100'
    audit-log-path: /var/log/kubernetes/audit/audit.log
    audit-policy-file: /etc/kubernetes/audit-policy.yaml
    enable-admission-plugins: NodeRestriction
    profiling: 'false'
  extraVolumes:
  - hostPath: /etc/kubernetes/audit-policy.yaml
    mountPath: /etc/kubernetes/audit-policy.yaml
    name: audit
    pathType: File
    readOnly: true
  - hostPath: /var/log/kubernetes/audit/
    mountPath: /var/log/kubernetes/audit/
    name: audit-log
    pathType: DirectoryOrCreate
    readOnly: false
apiVersion: kubeadm.k8s.io/v1beta3
controlPlaneEndpoint: all.new.local:6443
controllerManager:
  extraArgs:
    profiling: 'false'
    terminated-pod-gc-threshold: '1000'
dns:
  imageRepository: registry.k8s.io/coredns
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: v1.28.0
networking:
  podSubnet: 10.128.0.0/14
  serviceSubnet: 172.30.0.0/16
scheduler:
  extraArgs:
    profiling: 'false'
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.56.106
nodeRegistration:
  criSocket: /var/run/containerd/containerd.sock
  kubeletExtraArgs:
    container-runtime-endpoint: unix:///run/containerd/containerd.sock
  taints: []
patches:
  directory: /etc/kubernetes/patches

alexarefev · 2023-11-08T14:43:41Z

Hi @chendave!
Your suggestion is working fine. Thank you indeed

neolit123 · 2023-11-08T15:17:23Z

Here it is.
kubeadm-config:

thanks, these are just default settings for etcd from the kubeadm config.

related to:

@alexarefev another workaround is patch your etcd.yaml and remove all the defaults there, see:#2927 (comment) before upgrade.

yes, we were seeing these defaults but i can't see them with the official 1.28.0 binary.
i.e. i cannot reproduce the problem.

wget https://dl.k8s.io/v1.28.0/bin/linux/amd64/kubeadm

# chmod / install the binary

kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d", GitTreeState:"clean", BuildDate:"2023-08-15T10:20:15Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

# create node with "kubeadm init"

sudo cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.0.2.15:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://10.0.2.15:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://10.0.2.15:2380
    - --initial-cluster=lubo-it=https://10.0.2.15:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://10.0.2.15:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://10.0.2.15:2380
    - --name=lubo-it
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}


# then proceeding with upgrade to .3

wget https://dl.k8s.io/v1.28.3/bin/linux/amd64/kubeadm
# chmod / install the binary

sudo kubeadm upgrade apply v1.28.3 -v=5 -f
...
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Current and new manifests of etcd are equal, skipping upgrade
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.28.3". Enjoy!

@alexarefev what etcd.yaml are you getting if you create a new cluster with kubeadm init from v1.28.0?
can you also show the output of kubeadm version for that 1.28.0 binary?

alexarefev · 2023-11-08T18:34:00Z

@alexarefev what etcd.yaml are you getting if you create a new cluster with kubeadm init from v1.28.0?
can you also show the output of kubeadm version for that 1.28.0 binary?

@neolit123 the following
etcd.yaml:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.56.106:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.56.106:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://192.168.56.106:2380
    - --initial-cluster=ubuntu=https://192.168.56.106:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.56.106:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.56.106:2380
    - --name=ubuntu
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  terminationGracePeriodSeconds: 30
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

kubeadm version:

kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d", GitTreeState:"clean", BuildDate:"2023-08-15T10:20:15Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

neolit123 · 2023-11-08T19:10:42Z

kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d", GitTreeState:"clean", BuildDate:"2023-08-15T10:20:15Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

we have exactly the same binary, but i'm getting a different etcd.yaml (minus the IP diff).
mine does not have the problematic defaults like successThreshold: 1, dnsPolicy: ClusterFirst.

we saw similar strange behavior when the bug was found.

@@ -2,7 +2,7 @@ apiVersion: v1
 kind: Pod
 metadata:
   annotations:
-    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.0.2.15:2379
+    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.56.106:2379
   creationTimestamp: null
   labels:
     component: etcd
@@ -13,19 +13,19 @@ spec:
   containers:
   - command:
     - etcd
-    - --advertise-client-urls=https://10.0.2.15:2379
+    - --advertise-client-urls=https://192.168.56.106:2379
     - --cert-file=/etc/kubernetes/pki/etcd/server.crt
     - --client-cert-auth=true
     - --data-dir=/var/lib/etcd
     - --experimental-initial-corrupt-check=true
     - --experimental-watch-progress-notify-interval=5s
-    - --initial-advertise-peer-urls=https://10.0.2.15:2380
-    - --initial-cluster=lubo-it=https://10.0.2.15:2380
+    - --initial-advertise-peer-urls=https://192.168.56.106:2380
+    - --initial-cluster=ubuntu=https://192.168.56.106:2380
     - --key-file=/etc/kubernetes/pki/etcd/server.key
-    - --listen-client-urls=https://127.0.0.1:2379,https://10.0.2.15:2379
+    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.56.106:2379
     - --listen-metrics-urls=http://127.0.0.1:2381
-    - --listen-peer-urls=https://10.0.2.15:2380
-    - --name=lubo-it
+    - --listen-peer-urls=https://192.168.56.106:2380
+    - --name=ubuntu
     - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
     - --peer-client-cert-auth=true
     - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
@@ -43,6 +43,7 @@ spec:
         scheme: HTTP
       initialDelaySeconds: 10
       periodSeconds: 10
+      successThreshold: 1
       timeoutSeconds: 15
     name: etcd
     resources:
@@ -58,18 +59,26 @@ spec:
         scheme: HTTP
       initialDelaySeconds: 10
       periodSeconds: 10
+      successThreshold: 1
       timeoutSeconds: 15
+    terminationMessagePath: /dev/termination-log
+    terminationMessagePolicy: File
     volumeMounts:
     - mountPath: /var/lib/etcd
       name: etcd-data
     - mountPath: /etc/kubernetes/pki/etcd
       name: etcd-certs
+  dnsPolicy: ClusterFirst
+  enableServiceLinks: true
   hostNetwork: true
   priority: 2000001000
   priorityClassName: system-node-critical
+  restartPolicy: Always
+  schedulerName: default-scheduler
   securityContext:
     seccompProfile:
       type: RuntimeDefault
+  terminationGracePeriodSeconds: 30
   volumes:
   - hostPath:
       path: /etc/kubernetes/pki/etcd

this means the problem might happen for some 1.28.0 users, but not for others..
either way, the workarounds should be applied and there isn't much we can do like @chendave said.

thanks for the details, let's keep this tickets open until more users upgrade to 1.28.3.
we might have to add an entry about it in:
https://k8s-docs.netlify.app/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/

chendave · 2023-11-09T02:06:14Z

this is really a tricky issue, multiple factors are play here, golang version, kubeadm binary with the problematic code, os distro, dockerized env. etc. Sometimes we cannot reproduce it with the same binary on a specific env.

you can just patch your etcd.yaml to remove the defaults, revered patching the file with #2927 (comment).

unfortunately, we can just track this issue and guide end-user through the patch upgrade.

@neolit123 do you think we need to post some guide somewhere to help others work through it?

chendave · 2023-11-09T02:08:35Z

we might have to add an entry about it in:
https://k8s-docs.netlify.app/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/

I missed this, I will update the page to include this issue.

chendave · 2023-11-09T02:13:48Z

/assign

for the doc update

chendave · 2023-11-09T04:51:54Z

Just some of my thoughts on this,

Can we have defaults included for both "init" and "dest"? the defaults is not qualified of a upgrade anyway,

I did some test on my side,

--- a/cmd/kubeadm/app/util/marshal.go
+++ b/cmd/kubeadm/app/util/marshal.go
@@ -33,6 +33,7 @@ import (

        kubeadmapi "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm"
        "k8s.io/kubernetes/cmd/kubeadm/app/constants"
+       v1 "k8s.io/kubernetes/pkg/apis/core/v1"
 )

 // MarshalToYaml marshals an object into yaml.
@@ -57,7 +58,7 @@ func MarshalToYamlForCodecs(obj runtime.Object, gv schema.GroupVersion, codecs s
 // UniversalUnmarshal unmarshals YAML or JSON into a runtime.Object using the universal deserializer.
 func UniversalUnmarshal(buffer []byte) (runtime.Object, error) {
        codecs := clientsetscheme.Codecs
-       decoder := codecs.UniversalDeserializer()
+       decoder := codecs.UniversalDecoder(v1.SchemeGroupVersion)
        obj, _, err := decoder.Decode(buffer, nil, nil)
        if err != nil {
                return nil, errors.Wrapf(err, "failed to decode %s into runtime.Object", buffer)

According to the doc comments for UniversalDecoder, this decoder will perform defaulting, but it's doesn't actually generate any defaults.

// versions of objects to return - by default, runtime.APIVersionInternal is used. If any versions are specified,
// unrecognized groups will be returned in the version they are encoded as (no conversion). This decoder performs
// defaulting.
//
// TODO: the decoder will eventually be removed in favor of dealing with objects in their versioned form
// TODO: only accept a group versioner
func (f CodecFactory) UniversalDecoder(versions ...schema.GroupVersion) runtime.Decoder {

@liggitt Am I reading it wrong? shouldn't this codec generate defaults?

Also, I miss the context of removing the import of "k8s.io/kubernetes/pkg/apis/core/v1" in the first place, can anyone share with me why the import is not allowed in kubeadm? is that just because of the defaulting as this issue?

liggitt · 2023-11-09T04:58:45Z

@liggitt Am I reading it wrong? shouldn't this codec generate defaults?

It applies defaults it knows about, which are defaulting functions registered into the codec. Whether defaulting functions are registered or not depends on which packages are linked into the binary. The defaulting functions for core APIs are defined in k8s.io/kubernetes/... API packages and only intended for use by kube-apiserver

neolit123 · 2023-11-09T06:47:05Z

Also, I miss the context of removing the import of "k8s.io/kubernetes/pkg/apis/core/v1" in the first place, can anyone share with me why the import is not allowed in kubeadm? is that just because of the defaulting as this issue?

TL;DR there is a plan to extract components from k/k that are considered clients / out-of-tree - kubectl, kubeadm, etc.
such cannot depend on k/k/pkg or other areas of k/k.

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.

liangyuanpeng · 2023-11-23T06:29:15Z

let's keep this tickets open until more users upgrade to 1.28.3.

/lifecycle frozen

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue. Signed-off-by: Dave Chen <[email protected]>

neolit123 · 2024-05-20T13:30:02Z

let's keep this tickets open until more users upgrade to 1.28.3.

1.28.10 is the latest. closing until further notice.

alexarefev added the kind/bug Categorizes issue or PR as related to a bug. label Nov 7, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 7, 2023

alexarefev changed the title ~~Kubernetes upgrade fails due to kubeadm~~ Kubernetes upgrade fails due to kubeadm version Nov 7, 2023

k8s-ci-robot transferred this issue from kubernetes/kubernetes Nov 7, 2023

neolit123 modified the milestones: v1.28, v1.29 Nov 7, 2023

vlar0816 mentioned this issue Nov 8, 2023

Unable to upgrade k8s cluster from 1.28.0 to 1.28.3 Netcracker/KubeMarine#544

Closed

neolit123 changed the title ~~Kubernetes upgrade fails due to kubeadm version~~ kubeadm upgrade from v1.28.0 to v1.28.3 fails Nov 8, 2023

pacoxu mentioned this issue Nov 8, 2023

upgrade-1-28-latest and upgrade-addons-before-controlplane-1-28-latest failed #2927

Closed

10 tasks

neolit123 modified the milestones: v1.29, v1.28 Nov 8, 2023

k8s-ci-robot assigned chendave Nov 9, 2023

chendave added a commit to chendave/website that referenced this issue Nov 13, 2023

Update trouble shooting to include the issue of etcd upgrade

6a07c0a

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.

chendave mentioned this issue Nov 13, 2023

Update trouble shooting to include the issue of etcd upgrade kubernetes/website#43902

Closed

chendave added a commit to chendave/website that referenced this issue Nov 13, 2023

Update trouble shooting to include the issue of etcd upgrade

4140f6e

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.

chendave mentioned this issue Nov 13, 2023

Update trouble shooting to include the issue of etcd upgrade kubernetes/website#43903

Closed

chendave added a commit to chendave/website that referenced this issue Nov 13, 2023

Update trouble shooting to include the issue of etcd upgrade

1e6b39f

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.

chendave added a commit to chendave/website that referenced this issue Nov 13, 2023

Update trouble shooting to include the issue of etcd upgrade

97a2fbb

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.

chendave added a commit to chendave/website that referenced this issue Nov 14, 2023

Update trouble shooting to include the issue of etcd upgrade

31d5a38

For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 23, 2023

chendave mentioned this issue Nov 23, 2023

Update trouble shooting to include the issue of etcd upgrade kubernetes/website#44058

Merged

neolit123 unassigned chendave Feb 8, 2024

neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 8, 2024

neolit123 closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm upgrade from v1.28.0 to v1.28.3 fails #2957

kubeadm upgrade from v1.28.0 to v1.28.3 fails #2957

alexarefev commented Nov 7, 2023

k8s-ci-robot commented Nov 7, 2023

k8s-ci-robot commented Nov 7, 2023

neolit123 commented Nov 7, 2023

neolit123 commented Nov 7, 2023 •

edited

alexarefev commented Nov 8, 2023

neolit123 commented Nov 8, 2023

neolit123 commented Nov 8, 2023

neolit123 commented Nov 8, 2023

pacoxu commented Nov 8, 2023

chendave commented Nov 8, 2023

neolit123 commented Nov 8, 2023

neolit123 commented Nov 8, 2023

chendave commented Nov 8, 2023

chendave commented Nov 8, 2023

alexarefev commented Nov 8, 2023

alexarefev commented Nov 8, 2023

neolit123 commented Nov 8, 2023 •

edited

alexarefev commented Nov 8, 2023

neolit123 commented Nov 8, 2023

chendave commented Nov 9, 2023

chendave commented Nov 9, 2023

chendave commented Nov 9, 2023

chendave commented Nov 9, 2023 •

edited

liggitt commented Nov 9, 2023 •

edited

neolit123 commented Nov 9, 2023

liangyuanpeng commented Nov 23, 2023

neolit123 commented May 20, 2024

kubeadm upgrade from v1.28.0 to v1.28.3 fails #2957

kubeadm upgrade from v1.28.0 to v1.28.3 fails #2957

Comments

alexarefev commented Nov 7, 2023

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Nov 7, 2023

k8s-ci-robot commented Nov 7, 2023

neolit123 commented Nov 7, 2023

neolit123 commented Nov 7, 2023 • edited

alexarefev commented Nov 8, 2023

neolit123 commented Nov 8, 2023

neolit123 commented Nov 8, 2023

neolit123 commented Nov 8, 2023

pacoxu commented Nov 8, 2023

chendave commented Nov 8, 2023

neolit123 commented Nov 8, 2023

neolit123 commented Nov 8, 2023

chendave commented Nov 8, 2023

chendave commented Nov 8, 2023

alexarefev commented Nov 8, 2023

alexarefev commented Nov 8, 2023

neolit123 commented Nov 8, 2023 • edited

alexarefev commented Nov 8, 2023

neolit123 commented Nov 8, 2023

chendave commented Nov 9, 2023

chendave commented Nov 9, 2023

chendave commented Nov 9, 2023

chendave commented Nov 9, 2023 • edited

liggitt commented Nov 9, 2023 • edited

neolit123 commented Nov 9, 2023

liangyuanpeng commented Nov 23, 2023

neolit123 commented May 20, 2024

neolit123 commented Nov 7, 2023 •

edited

neolit123 commented Nov 8, 2023 •

edited

chendave commented Nov 9, 2023 •

edited

liggitt commented Nov 9, 2023 •

edited