Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade-1-28-latest and upgrade-addons-before-controlplane-1-28-latest failed #2927

Closed
9 of 10 tasks
pacoxu opened this issue Sep 8, 2023 · 71 comments
Closed
9 of 10 tasks
Labels
area/kinder Issues to track work in the kinder tool kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@pacoxu
Copy link
Member

pacoxu commented Sep 8, 2023

https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-addons-before-controlplane-1-28-latest

keeps failing after kubernetes/release#3254.

/assign

ref #2925

See #2927 (comment) for the conclusion.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 8, 2023

[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
I0907 10:10:09.109104    3704 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.6:2379/ https://172.17.0.4:2379/ https://172.17.0.3:2379/]) are available 1/10
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
	cmd/kubeadm/app/phases/upgrade/staticpods.go:525
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
	cmd/kubeadm/app/phases/upgrade/staticpods.go:254
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:338
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
	cmd/kubeadm/app/phases/upgrade/staticpods.go:465
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:617
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
	cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:267
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:367
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
	cmd/kubeadm/app/phases/upgrade/staticpods.go:465
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
push_pin
	cmd/kubeadm/app/phases/upgrade/staticpods.go:617
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
	cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:267
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
[upgrade/apply] FATAL
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:157
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:267
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
Error: failed to exec action kubeadm-upgrade: exit status 1
 exit status 1

@pacoxu
Copy link
Member Author

pacoxu commented Sep 8, 2023

@neolit123 I suspect this is related to kubernetes/kubernetes#119156.

The etcd manifest is regenerated but the same.
https://github.com/kubernetes/kubernetes/blob/fba0802836d1a67f689d910c7da56516b7429e92/cmd/kubeadm/app/util/staticpod/utils.go#L354-L372

The logic of manifest comparison is different with the one in kubelet. I may sync that here. Let me check.

@neolit123
Copy link
Member

neolit123 commented Sep 8, 2023

@neolit123 I suspect this is related to kubernetes/kubernetes#119156.

The etcd manifest is regenerated but the same. https://github.com/kubernetes/kubernetes/blob/fba0802836d1a67f689d910c7da56516b7429e92/cmd/kubeadm/app/util/staticpod/utils.go#L354-L372

The logic of manifest comparison is different with the one in kubelet. I may sync that here. Let me check.

could be. but isn't v1beta4 not used in our CI yet?
also, i tried making extraargs order sorted and deterministic so that tools like diff/hash work the same way every time.

these jobs have to be run locally to test kinder test workflow ..... --verbose --loglevel=debug.
(one trick is to call the above command with --dry-run and see what steps to only call individually, because the last steps delete/reset)
i can try this today, but not sure i will have time.

looking at the testgrids however, it seems these jobs have green runs on the 6th and 7th of Sept. which is confusing.
it means it's a flake.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 8, 2023

Let me test locally. I will have a try.

@neolit123 neolit123 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. area/kinder Issues to track work in the kinder tool labels Sep 8, 2023
@neolit123 neolit123 added this to the v1.29 milestone Sep 8, 2023
@neolit123
Copy link
Member

FTR, i was able to reproduce it locally:

I0908 15:44:43.348180    2747 round_trippers.go:553] GET https://172.17.0.2:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 5 milliseconds
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
I0908 15:44:43.349293    2747 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.5:2379 https://172.17.0.3:2379 https://172.17.0.7:2379]) are available 1/10
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests

i think the problem is that that the constructed k8s client cannot obtain the updated etcd pod from etcd.yaml.
it timeout after 5 minutes and then shows the "hash did not change" message.

the next step is to try to understand why the kubelet is not starting the pod, or if it started it to see if its logs have errors.

@neolit123
Copy link
Member

neolit123 commented Sep 8, 2023

dumping some logs here.

so while the CP1 upgrade was running trying some commands to see what is going on. not seeing any problems.

error:

I0908 16:12:21.067793    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 9 milliseconds
I0908 16:12:21.563327    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 4 milliseconds
I0908 16:12:22.067745    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 6 milliseconds
I0908 16:12:22.563668    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 5 milliseconds
I0908 16:12:23.063400    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 5 milliseconds
I0908 16:12:23.568437    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 9 milliseconds
I0908 16:12:25.020262    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 49 milliseconds
I0908 16:12:25.037486    2946 round_trippers.go:553] GET https://172.17.0.4:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 16 milliseconds
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
I0908 16:12:25.040911    2946 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.3:2379 https://172.17.0.6:2379 https://172.17.0.2:2379]) are available 1/10
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
	cmd/kubeadm/app/phases/upgrade/staticpods.go:525
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
	cmd/kubeadm/app/phases/upgrade/staticpods.go:254
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:338
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
	cmd/kubeadm/app/phases/upgrade/staticpods.go:465
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:617
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
	cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25

kubelet log:

Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.161555     787 server.go:203] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.176238     787 server.go:467] "Kubelet version" kubeletVersion="v1.28.1-55+afc6b4eebe7d80"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.176329     787 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.176800     787 server.go:895] "Client rotation is on, will bootstrap in background"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.179031     787 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.180212     787 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.195607     787 server.go:725] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.196699     787 container_manager_linux.go:265] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.197359     787 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"systemd","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.197517     787 topology_manager.go:138] "Creating topology manager with none policy"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.198079     787 container_manager_linux.go:301] "Creating device plugin manager"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.198174     787 state_mem.go:36] "Initialized new in-memory state store"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.198256     787 kubelet.go:393] "Attempting to sync node with API server"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.198268     787 kubelet.go:298] "Adding static pod path" path="/etc/kubernetes/manifests"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.198294     787 kubelet.go:309] "Adding apiserver pod source"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.198307     787 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.201909     787 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.6.9" apiVersion="v1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.206751     787 server.go:1232] "Started kubelet"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.210512     787 server.go:162] "Starting to listen" address="0.0.0.0" port=10250
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.210668     787 ratelimit.go:65] "Setting rate limiting for podresources endpoint" qps=100 burstTokens=10
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.210868     787 server.go:233] "Starting to serve the podresources API" endpoint="unix:/var/lib/kubelet/pod-resources/kubelet.sock"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.213119     787 server.go:462] "Adding debug handlers to kubelet server"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.214629     787 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.231327     787 volume_manager.go:291] "Starting Kubelet Volume Manager"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.231513     787 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.231625     787 reconciler_new.go:29] "Reconciler: start to sync state"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.298479     787 kubelet_network_linux.go:50] "Initialized iptables rules." protocol="IPv4"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.301272     787 kubelet_network_linux.go:50] "Initialized iptables rules." protocol="IPv6"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.301297     787 status_manager.go:217] "Starting to sync pod status with apiserver"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.301478     787 kubelet.go:2303] "Starting kubelet main sync loop"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:19.301523     787 kubelet.go:2327] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.334814     787 kubelet_node_status.go:70] "Attempting to register node" node="kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.361944     787 kubelet_node_status.go:108] "Node was previously registered" node="kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.362605     787 kubelet_node_status.go:73] "Successfully registered node" node="kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:19.401968     787 kubelet.go:2327] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.546953     787 cpu_manager.go:214] "Starting CPU manager" policy="none"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.547374     787 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.547448     787 state_mem.go:36] "Initialized new in-memory state store"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.547747     787 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.547823     787 state_mem.go:96] "Updated CPUSet assignments" assignments={}
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.547865     787 policy_none.go:49] "None policy: Start"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.550148     787 memory_manager.go:169] "Starting memorymanager" policy="None"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.550172     787 state_mem.go:35] "Initializing new in-memory state store"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.550316     787 state_mem.go:75] "Updated machine memory state"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.572680     787 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.574179     787 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.603268     787 topology_manager.go:215] "Topology Admit Handler" podUID="c7e2db72451dd046cd13e2e47807266c" podNamespace="kube-system" podName="kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.604190     787 topology_manager.go:215] "Topology Admit Handler" podUID="b17c470792fe1a5fc2e7895ab67eb556" podNamespace="kube-system" podName="kube-scheduler-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.604303     787 topology_manager.go:215] "Topology Admit Handler" podUID="a849d1646a777cbe24902136b191dfc2" podNamespace="kube-system" podName="etcd-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.604378     787 topology_manager.go:215] "Topology Admit Handler" podUID="f8c4776a1edace8f327a9c6984430aaf" podNamespace="kube-system" podName="kube-apiserver-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635260     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"ca-certs\" (UniqueName: \"kubernetes.io/host-path/f8c4776a1edace8f327a9c6984430aaf-ca-certs\") pod \"kube-apiserver-kinder-upgrade-control-plane-1\" (UID: \"f8c4776a1edace8f327a9c6984430aaf\") " pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635579     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etc-ca-certificates\" (UniqueName: \"kubernetes.io/host-path/f8c4776a1edace8f327a9c6984430aaf-etc-ca-certificates\") pod \"kube-apiserver-kinder-upgrade-control-plane-1\" (UID: \"f8c4776a1edace8f327a9c6984430aaf\") " pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635603     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"usr-share-ca-certificates\" (UniqueName: \"kubernetes.io/host-path/f8c4776a1edace8f327a9c6984430aaf-usr-share-ca-certificates\") pod \"kube-apiserver-kinder-upgrade-control-plane-1\" (UID: \"f8c4776a1edace8f327a9c6984430aaf\") " pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635618     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"k8s-certs\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-k8s-certs\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635636     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"usr-local-share-ca-certificates\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-usr-local-share-ca-certificates\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635652     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etcd-certs\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-certs\") pod \"etcd-kinder-upgrade-control-plane-1\" (UID: \"a849d1646a777cbe24902136b191dfc2\") " pod="kube-system/etcd-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635667     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"flexvolume-dir\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-flexvolume-dir\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635687     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"ca-certs\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-ca-certs\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635705     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"usr-share-ca-certificates\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-usr-share-ca-certificates\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635721     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"k8s-certs\" (UniqueName: \"kubernetes.io/host-path/f8c4776a1edace8f327a9c6984430aaf-k8s-certs\") pod \"kube-apiserver-kinder-upgrade-control-plane-1\" (UID: \"f8c4776a1edace8f327a9c6984430aaf\") " pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635734     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etcd-data\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-data\") pod \"etcd-kinder-upgrade-control-plane-1\" (UID: \"a849d1646a777cbe24902136b191dfc2\") " pod="kube-system/etcd-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635750     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"usr-local-share-ca-certificates\" (UniqueName: \"kubernetes.io/host-path/f8c4776a1edace8f327a9c6984430aaf-usr-local-share-ca-certificates\") pod \"kube-apiserver-kinder-upgrade-control-plane-1\" (UID: \"f8c4776a1edace8f327a9c6984430aaf\") " pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635766     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etc-ca-certificates\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-etc-ca-certificates\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635784     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubeconfig\" (UniqueName: \"kubernetes.io/host-path/c7e2db72451dd046cd13e2e47807266c-kubeconfig\") pod \"kube-controller-manager-kinder-upgrade-control-plane-1\" (UID: \"c7e2db72451dd046cd13e2e47807266c\") " pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1"
Sep 08 16:03:19 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:19.635801     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubeconfig\" (UniqueName: \"kubernetes.io/host-path/b17c470792fe1a5fc2e7895ab67eb556-kubeconfig\") pod \"kube-scheduler-kinder-upgrade-control-plane-1\" (UID: \"b17c470792fe1a5fc2e7895ab67eb556\") " pod="kube-system/kube-scheduler-kinder-upgrade-control-plane-1"
Sep 08 16:03:20 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:20.200737     787 apiserver.go:52] "Watching apiserver"
Sep 08 16:03:20 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:20.233489     787 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"
Sep 08 16:03:20 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:20.383925     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kube-scheduler-kinder-upgrade-control-plane-1" podStartSLOduration=1.383892164 podCreationTimestamp="2023-09-08 16:03:19 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:20.383595166 +0000 UTC m=+1.393591680" watchObservedRunningTime="2023-09-08 16:03:20.383892164 +0000 UTC m=+1.393888667"
Sep 08 16:03:20 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:20.431352     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kube-controller-manager-kinder-upgrade-control-plane-1" podStartSLOduration=1.431326347 podCreationTimestamp="2023-09-08 16:03:19 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:20.40101415 +0000 UTC m=+1.411010656" watchObservedRunningTime="2023-09-08 16:03:20.431326347 +0000 UTC m=+1.441322854"
Sep 08 16:03:20 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:20.447091     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1" podStartSLOduration=1.44706065 podCreationTimestamp="2023-09-08 16:03:19 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:20.431824177 +0000 UTC m=+1.441820689" watchObservedRunningTime="2023-09-08 16:03:20.44706065 +0000 UTC m=+1.457057156"
Sep 08 16:03:20 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:20.470815     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/etcd-kinder-upgrade-control-plane-1" podStartSLOduration=1.470656712 podCreationTimestamp="2023-09-08 16:03:19 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:20.44792205 +0000 UTC m=+1.457918564" watchObservedRunningTime="2023-09-08 16:03:20.470656712 +0000 UTC m=+1.480653226"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.316844     787 kuberuntime_manager.go:1523] "Updating runtime config through cri with podcidr" CIDR="192.168.0.0/24"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.318791     787 kubelet_network.go:61] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="192.168.0.0/24"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.568793     787 topology_manager.go:215] "Topology Admit Handler" podUID="cf9fb191-720b-458f-b69a-65a3a385c0a6" podNamespace="kube-system" podName="kube-proxy-jntx2"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.584194     787 topology_manager.go:215] "Topology Admit Handler" podUID="a67295cf-c317-458e-9aec-775d024d89a2" podNamespace="kube-system" podName="kindnet-bm9w5"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.670795     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/cf9fb191-720b-458f-b69a-65a3a385c0a6-lib-modules\") pod \"kube-proxy-jntx2\" (UID: \"cf9fb191-720b-458f-b69a-65a3a385c0a6\") " pod="kube-system/kube-proxy-jntx2"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.670938     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-dfx2t\" (UniqueName: \"kubernetes.io/projected/cf9fb191-720b-458f-b69a-65a3a385c0a6-kube-api-access-dfx2t\") pod \"kube-proxy-jntx2\" (UID: \"cf9fb191-720b-458f-b69a-65a3a385c0a6\") " pod="kube-system/kube-proxy-jntx2"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.671004     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/a67295cf-c317-458e-9aec-775d024d89a2-lib-modules\") pod \"kindnet-bm9w5\" (UID: \"a67295cf-c317-458e-9aec-775d024d89a2\") " pod="kube-system/kindnet-bm9w5"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.671060     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/a67295cf-c317-458e-9aec-775d024d89a2-xtables-lock\") pod \"kindnet-bm9w5\" (UID: \"a67295cf-c317-458e-9aec-775d024d89a2\") " pod="kube-system/kindnet-bm9w5"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.671118     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-dpjpr\" (UniqueName: \"kubernetes.io/projected/a67295cf-c317-458e-9aec-775d024d89a2-kube-api-access-dpjpr\") pod \"kindnet-bm9w5\" (UID: \"a67295cf-c317-458e-9aec-775d024d89a2\") " pod="kube-system/kindnet-bm9w5"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.671170     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/cf9fb191-720b-458f-b69a-65a3a385c0a6-xtables-lock\") pod \"kube-proxy-jntx2\" (UID: \"cf9fb191-720b-458f-b69a-65a3a385c0a6\") " pod="kube-system/kube-proxy-jntx2"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.671219     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-cfg\" (UniqueName: \"kubernetes.io/host-path/a67295cf-c317-458e-9aec-775d024d89a2-cni-cfg\") pod \"kindnet-bm9w5\" (UID: \"a67295cf-c317-458e-9aec-775d024d89a2\") " pod="kube-system/kindnet-bm9w5"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:22.671306     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/cf9fb191-720b-458f-b69a-65a3a385c0a6-kube-proxy\") pod \"kube-proxy-jntx2\" (UID: \"cf9fb191-720b-458f-b69a-65a3a385c0a6\") " pod="kube-system/kube-proxy-jntx2"
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:22.790158     787 projected.go:292] Couldn't get configMap kube-system/kube-root-ca.crt: configmap "kube-root-ca.crt" not found
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:22.790250     787 projected.go:198] Error preparing data for projected volume kube-api-access-dpjpr for pod kube-system/kindnet-bm9w5: configmap "kube-root-ca.crt" not found
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:22.790374     787 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/a67295cf-c317-458e-9aec-775d024d89a2-kube-api-access-dpjpr podName:a67295cf-c317-458e-9aec-775d024d89a2 nodeName:}" failed. No retries permitted until 2023-09-08 16:03:23.290350415 +0000 UTC m=+4.300346921 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access-dpjpr" (UniqueName: "kubernetes.io/projected/a67295cf-c317-458e-9aec-775d024d89a2-kube-api-access-dpjpr") pod "kindnet-bm9w5" (UID: "a67295cf-c317-458e-9aec-775d024d89a2") : configmap "kube-root-ca.crt" not found
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:22.801248     787 projected.go:292] Couldn't get configMap kube-system/kube-root-ca.crt: configmap "kube-root-ca.crt" not found
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:22.801269     787 projected.go:198] Error preparing data for projected volume kube-api-access-dfx2t for pod kube-system/kube-proxy-jntx2: configmap "kube-root-ca.crt" not found
Sep 08 16:03:22 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:03:22.801305     787 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/cf9fb191-720b-458f-b69a-65a3a385c0a6-kube-api-access-dfx2t podName:cf9fb191-720b-458f-b69a-65a3a385c0a6 nodeName:}" failed. No retries permitted until 2023-09-08 16:03:23.301291631 +0000 UTC m=+4.311288135 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access-dfx2t" (UniqueName: "kubernetes.io/projected/cf9fb191-720b-458f-b69a-65a3a385c0a6-kube-api-access-dfx2t") pod "kube-proxy-jntx2" (UID: "cf9fb191-720b-458f-b69a-65a3a385c0a6") : configmap "kube-root-ca.crt" not found
Sep 08 16:03:28 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:28.758917     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kube-proxy-jntx2" podStartSLOduration=6.758884582 podCreationTimestamp="2023-09-08 16:03:22 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:25.412042145 +0000 UTC m=+6.422038660" watchObservedRunningTime="2023-09-08 16:03:28.758884582 +0000 UTC m=+9.768881094"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.746577     787 kubelet_node_status.go:493] "Fast updating node status as it just became ready"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.792325     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/kindnet-bm9w5" podStartSLOduration=3.493828085 podCreationTimestamp="2023-09-08 16:03:22 +0000 UTC" firstStartedPulling="2023-09-08 16:03:23.970000727 +0000 UTC m=+4.979997224" lastFinishedPulling="2023-09-08 16:03:41.268468076 +0000 UTC m=+22.278464578" observedRunningTime="2023-09-08 16:03:42.528548256 +0000 UTC m=+23.538544767" watchObservedRunningTime="2023-09-08 16:03:42.792295439 +0000 UTC m=+23.802291945"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.792623     787 topology_manager.go:215] "Topology Admit Handler" podUID="e5113fad-cc28-4a6c-a3b8-90d21b46e58c" podNamespace="kube-system" podName="coredns-5dd5756b68-27wkr"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.795555     787 topology_manager.go:215] "Topology Admit Handler" podUID="365d7a73-6677-48ce-a5e3-c6195299a83d" podNamespace="kube-system" podName="coredns-5dd5756b68-d57jf"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.850891     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-wzdjq\" (UniqueName: \"kubernetes.io/projected/e5113fad-cc28-4a6c-a3b8-90d21b46e58c-kube-api-access-wzdjq\") pod \"coredns-5dd5756b68-27wkr\" (UID: \"e5113fad-cc28-4a6c-a3b8-90d21b46e58c\") " pod="kube-system/coredns-5dd5756b68-27wkr"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.851150     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"config-volume\" (UniqueName: \"kubernetes.io/configmap/365d7a73-6677-48ce-a5e3-c6195299a83d-config-volume\") pod \"coredns-5dd5756b68-d57jf\" (UID: \"365d7a73-6677-48ce-a5e3-c6195299a83d\") " pod="kube-system/coredns-5dd5756b68-d57jf"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.851182     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"config-volume\" (UniqueName: \"kubernetes.io/configmap/e5113fad-cc28-4a6c-a3b8-90d21b46e58c-config-volume\") pod \"coredns-5dd5756b68-27wkr\" (UID: \"e5113fad-cc28-4a6c-a3b8-90d21b46e58c\") " pod="kube-system/coredns-5dd5756b68-27wkr"
Sep 08 16:03:42 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:42.851199     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-58ff2\" (UniqueName: \"kubernetes.io/projected/365d7a73-6677-48ce-a5e3-c6195299a83d-kube-api-access-58ff2\") pod \"coredns-5dd5756b68-d57jf\" (UID: \"365d7a73-6677-48ce-a5e3-c6195299a83d\") " pod="kube-system/coredns-5dd5756b68-d57jf"
Sep 08 16:03:45 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:45.622632     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/coredns-5dd5756b68-d57jf" podStartSLOduration=22.622596873 podCreationTimestamp="2023-09-08 16:03:23 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:44.596538279 +0000 UTC m=+25.606534792" watchObservedRunningTime="2023-09-08 16:03:45.622596873 +0000 UTC m=+26.632593386"
Sep 08 16:03:45 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:03:45.656823     787 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/coredns-5dd5756b68-27wkr" podStartSLOduration=22.65679182 podCreationTimestamp="2023-09-08 16:03:23 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-08 16:03:45.626605499 +0000 UTC m=+26.636602014" watchObservedRunningTime="2023-09-08 16:03:45.65679182 +0000 UTC m=+26.666788329"
Sep 08 16:04:07 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:04:07.005550     787 controller.go:193] "Failed to update lease" err="etcdserver: request timed out"
Sep 08 16:04:09 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:04:09.605171     787 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-kinder-upgrade-control-plane-1.1782f7f541209157", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-apiserver-kinder-upgrade-control-plane-1", UID:"f8c4776a1edace8f327a9c6984430aaf", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver}"}, Reason:"Unhealthy", Message:"Liveness probe failed: HTTP probe failed with statuscode: 500", Source:v1.EventSource{Component:"kubelet", Host:"kinder-upgrade-control-plane-1"}, FirstTimestamp:time.Date(2023, time.September, 8, 16, 4, 2, 576429399, time.Local), LastTimestamp:time.Date(2023, time.September, 8, 16, 4, 2, 576429399, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"kinder-upgrade-control-plane-1"}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Sep 08 16:04:12 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:04:12.569526     787 scope.go:117] "RemoveContainer" containerID="6594e7ed24082daf571226f0d8c59ca68e222ab3f895976b5a2baab2fe8a9213"
Sep 08 16:04:13 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:04:13.548853     787 status_manager.go:853] "Failed to get status for pod" podUID="f8c4776a1edace8f327a9c6984430aaf" pod="kube-system/kube-apiserver-kinder-upgrade-control-plane-1" err="etcdserver: leader changed"
Sep 08 16:04:13 kinder-upgrade-control-plane-1 kubelet[787]: E0908 16:04:13.631120     787 controller.go:193] "Failed to update lease" err="Operation cannot be fulfilled on leases.coordination.k8s.io \"kinder-upgrade-control-plane-1\": the object has been modified; please apply your changes to the latest version and try again"
Sep 08 16:04:14 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:04:14.605151     787 scope.go:117] "RemoveContainer" containerID="d5dfe0bf606adcc0f34a7818dd9bc75879a117b3288f15c62c740a5c326b79ef"
Sep 08 16:07:25 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:25.978793     787 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"etcd-data\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-data\") pod \"a849d1646a777cbe24902136b191dfc2\" (UID: \"a849d1646a777cbe24902136b191dfc2\") "
Sep 08 16:07:25 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:25.978834     787 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"etcd-certs\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-certs\") pod \"a849d1646a777cbe24902136b191dfc2\" (UID: \"a849d1646a777cbe24902136b191dfc2\") "
Sep 08 16:07:25 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:25.978882     787 operation_generator.go:878] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-certs" (OuterVolumeSpecName: "etcd-certs") pod "a849d1646a777cbe24902136b191dfc2" (UID: "a849d1646a777cbe24902136b191dfc2"). InnerVolumeSpecName "etcd-certs". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Sep 08 16:07:25 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:25.978916     787 operation_generator.go:878] UnmountVolume.TearDown succeeded for volume "kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-data" (OuterVolumeSpecName: "etcd-data") pod "a849d1646a777cbe24902136b191dfc2" (UID: "a849d1646a777cbe24902136b191dfc2"). InnerVolumeSpecName "etcd-data". PluginName "kubernetes.io/host-path", VolumeGidValue ""
Sep 08 16:07:26 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:26.079594     787 reconciler_common.go:300] "Volume detached for volume \"etcd-data\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-data\") on node \"kinder-upgrade-control-plane-1\" DevicePath \"\""
Sep 08 16:07:26 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:26.079676     787 reconciler_common.go:300] "Volume detached for volume \"etcd-certs\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-certs\") on node \"kinder-upgrade-control-plane-1\" DevicePath \"\""
Sep 08 16:07:26 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:26.761165     787 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="ea7d52e4539012926269938555951e1602f58e345f5e6e63f7838b0dbe97af51"
Sep 08 16:07:27 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:27.492653     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etcd-data\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-data\") pod \"etcd-kinder-upgrade-control-plane-1\" (UID: \"a849d1646a777cbe24902136b191dfc2\") " pod="kube-system/etcd-kinder-upgrade-control-plane-1"
Sep 08 16:07:27 kinder-upgrade-control-plane-1 kubelet[787]: I0908 16:07:27.492697     787 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etcd-certs\" (UniqueName: \"kubernetes.io/host-path/a849d1646a777cbe24902136b191dfc2-etcd-certs\") pod \"etcd-kinder-upgrade-control-plane-1\" (UID: \"a849d1646a777cbe24902136b191dfc2\") " pod="kube-system/etcd-kinder-upgrade-control-plane-1"
Sep 08 16:07:48 kinder-upgrade-control-plane-1 kubelet[787]: W0908 16:07:48.822617     787 conversion.go:112] Could not get instant cpu stats: cumulative stats decrease

old manifest:

root@kinder-upgrade-control-plane-1:/# cat /etc/kubernetes/manifests/etcd.yaml     
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.3:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.3:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.17.0.3:2380
    - --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.3:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.3:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.17.0.3:2380
    - --name=kinder-upgrade-control-plane-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  terminationGracePeriodSeconds: 30
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

manifest after upgrade:

root@kinder-upgrade-control-plane-1:/# cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.3:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.3:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.17.0.3:2380
    - --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.3:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.3:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.17.0.3:2380
    - --name=kinder-upgrade-control-plane-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

the pod from the upgrade manifest is running:

root@kinder-upgrade-control-plane-1:/# kubectl --kubeconfig=/etc/kubernetes/admin.conf get po -A
NAMESPACE     NAME                                                     READY   STATUS    RESTARTS        AGE
kube-system   coredns-5dd5756b68-27wkr                                 1/1     Running   0               6m26s
kube-system   coredns-5dd5756b68-d57jf                                 1/1     Running   0               6m26s
kube-system   etcd-kinder-upgrade-control-plane-1                      1/1     Running   1 (2m24s ago)   6m30s
kube-system   etcd-kinder-upgrade-control-plane-2                      1/1     Running   0               5m43s
kube-system   etcd-kinder-upgrade-control-plane-3                      1/1     Running   0               4m59s
kube-system   kindnet-bm9w5                                            1/1     Running   0               6m27s
kube-system   kindnet-ctg6k                                            1/1     Running   0               5m51s
kube-system   kindnet-dbr9s                                            1/1     Running   0               4m59s
kube-system   kindnet-hpn5x                                            1/1     Running   0               4m
kube-system   kindnet-sgzht                                            1/1     Running   0               4m28s
kube-system   kube-apiserver-kinder-upgrade-control-plane-1            1/1     Running   0               6m30s
kube-system   kube-apiserver-kinder-upgrade-control-plane-2            1/1     Running   0               5m37s
kube-system   kube-apiserver-kinder-upgrade-control-plane-3            1/1     Running   0               4m41s
kube-system   kube-controller-manager-kinder-upgrade-control-plane-1   1/1     Running   1 (5m39s ago)   6m30s
kube-system   kube-controller-manager-kinder-upgrade-control-plane-2   1/1     Running   0               5m37s
kube-system   kube-controller-manager-kinder-upgrade-control-plane-3   1/1     Running   0               4m57s
kube-system   kube-proxy-285dc                                         1/1     Running   0               4m
kube-system   kube-proxy-7f8zx                                         1/1     Running   0               5m51s
kube-system   kube-proxy-jntx2                                         1/1     Running   0               6m27s
kube-system   kube-proxy-sdchr                                         1/1     Running   0               4m28s
kube-system   kube-proxy-sx9nk                                         1/1     Running   0               4m59s
kube-system   kube-scheduler-kinder-upgrade-control-plane-1            1/1     Running   1 (5m36s ago)   6m30s
kube-system   kube-scheduler-kinder-upgrade-control-plane-2            1/1     Running   0               5m38s
kube-system   kube-scheduler-kinder-upgrade-control-plane-3            1/1     Running   0               4m49s
root@kinder-upgrade-control-plane-1:/# kubectl --kubeconfig=/etc/kubernetes/admin.conf get po -n kube-system etcd-kinder-upgrade-control-plane-1
NAME                                  READY   STATUS    RESTARTS       AGE
etcd-kinder-upgrade-control-plane-1   1/1     Running   1 (4m1s ago)   8m7s

etcd pod logs:

root@kinder-upgrade-control-plane-1:/# kubectl --kubeconfig=/etc/kubernetes/admin.conf logs -n kube-system etcd-kinder-upgrade-control-plane-1
{"level":"warn","ts":"2023-09-08T16:07:28.15643Z","caller":"embed/config.go:673","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"info","ts":"2023-09-08T16:07:28.159958Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--advertise-client-urls=https://172.17.0.3:2379","--cert-file=/etc/kubernetes/pki/etcd/server.crt","--client-cert-auth=true","--data-dir=/var/lib/etcd","--experimental-initial-corrupt-check=true","--experimental-watch-progress-notify-interval=5s","--initial-advertise-peer-urls=https://172.17.0.3:2380","--initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.3:2380","--key-file=/etc/kubernetes/pki/etcd/server.key","--listen-client-urls=https://127.0.0.1:2379,https://172.17.0.3:2379","--listen-metrics-urls=http://127.0.0.1:2381","--listen-peer-urls=https://172.17.0.3:2380","--name=kinder-upgrade-control-plane-1","--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt","--peer-client-cert-auth=true","--peer-key-file=/etc/kubernetes/pki/etcd/peer.key","--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt","--snapshot-count=10000","--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"]}
{"level":"info","ts":"2023-09-08T16:07:28.160066Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"}
{"level":"warn","ts":"2023-09-08T16:07:28.160091Z","caller":"embed/config.go:673","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"info","ts":"2023-09-08T16:07:28.160099Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://172.17.0.3:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.160137Z","caller":"embed/etcd.go:495","msg":"starting with peer TLS","tls-info":"cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2023-09-08T16:07:28.161002Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379","https://172.17.0.3:2379"]}
{"level":"info","ts":"2023-09-08T16:07:28.161636Z","caller":"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.9","git-sha":"bdbbde998","go-version":"go1.19.9","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"kinder-upgrade-control-plane-1","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://172.17.0.3:2380"],"listen-peer-urls":["https://172.17.0.3:2380"],"advertise-client-urls":["https://172.17.0.3:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://172.17.0.3:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2023-09-08T16:07:28.166445Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"4.044528ms"}
{"level":"info","ts":"2023-09-08T16:07:28.176212Z","caller":"etcdserver/server.go:530","msg":"No snapshot found. Recovering WAL from scratch!"}
{"level":"info","ts":"2023-09-08T16:07:28.220214Z","caller":"etcdserver/raft.go:530","msg":"restarting local member","cluster-id":"86482fea2286a1d2","local-member-id":"b273bc7741bcb020","commit-index":1255}
{"level":"info","ts":"2023-09-08T16:07:28.22102Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"b273bc7741bcb020 switched to configuration voters=()"}
{"level":"info","ts":"2023-09-08T16:07:28.221104Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"b273bc7741bcb020 became follower at term 4"}
{"level":"info","ts":"2023-09-08T16:07:28.22165Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft b273bc7741bcb020 [peers: [], term: 4, commit: 1255, applied: 0, lastindex: 1255, lastterm: 4]"}
{"level":"warn","ts":"2023-09-08T16:07:28.226179Z","caller":"auth/store.go:1238","msg":"simple token is not cryptographically signed"}
{"level":"info","ts":"2023-09-08T16:07:28.271139Z","caller":"mvcc/kvstore.go:393","msg":"kvstore restored","current-rev":1113}
{"level":"info","ts":"2023-09-08T16:07:28.274539Z","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
{"level":"info","ts":"2023-09-08T16:07:28.278362Z","caller":"etcdserver/corrupt.go:95","msg":"starting initial corruption check","local-member-id":"b273bc7741bcb020","timeout":"7s"}
{"level":"info","ts":"2023-09-08T16:07:28.278839Z","caller":"etcdserver/corrupt.go:165","msg":"initial corruption checking passed; no corruption","local-member-id":"b273bc7741bcb020"}
{"level":"info","ts":"2023-09-08T16:07:28.278935Z","caller":"etcdserver/server.go:854","msg":"starting etcd server","local-member-id":"b273bc7741bcb020","local-server-version":"3.5.9","cluster-version":"to_be_decided"}
{"level":"info","ts":"2023-09-08T16:07:28.27973Z","caller":"etcdserver/server.go:754","msg":"starting initial election tick advance","election-ticks":10}
{"level":"info","ts":"2023-09-08T16:07:28.27975Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/var/lib/etcd/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
{"level":"info","ts":"2023-09-08T16:07:28.280772Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"b273bc7741bcb020 switched to configuration voters=(12858828581462913056)"}
{"level":"info","ts":"2023-09-08T16:07:28.280821Z","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"86482fea2286a1d2","local-member-id":"b273bc7741bcb020","added-peer-id":"b273bc7741bcb020","added-peer-peer-urls":["https://172.17.0.3:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.280743Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/var/lib/etcd/member/snap","suffix":"snap","max":5,"interval":"30s"}
{"level":"info","ts":"2023-09-08T16:07:28.280848Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/var/lib/etcd/member/wal","suffix":"wal","max":5,"interval":"30s"}
{"level":"info","ts":"2023-09-08T16:07:28.280883Z","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"86482fea2286a1d2","local-member-id":"b273bc7741bcb020","cluster-version":"3.5"}
{"level":"info","ts":"2023-09-08T16:07:28.280902Z","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"info","ts":"2023-09-08T16:07:28.284921Z","caller":"embed/etcd.go:726","msg":"starting with client TLS","tls-info":"cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, client-cert=, client-key=, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2023-09-08T16:07:28.285331Z","caller":"embed/etcd.go:278","msg":"now serving peer/client/metrics","local-member-id":"b273bc7741bcb020","initial-advertise-peer-urls":["https://172.17.0.3:2380"],"listen-peer-urls":["https://172.17.0.3:2380"],"advertise-client-urls":["https://172.17.0.3:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://172.17.0.3:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"]}
{"level":"info","ts":"2023-09-08T16:07:28.285351Z","caller":"embed/etcd.go:855","msg":"serving metrics","address":"http://127.0.0.1:2381"}
{"level":"info","ts":"2023-09-08T16:07:28.285733Z","caller":"embed/etcd.go:597","msg":"serving peer traffic","address":"172.17.0.3:2380"}
{"level":"info","ts":"2023-09-08T16:07:28.28581Z","caller":"embed/etcd.go:569","msg":"cmux::serve","address":"172.17.0.3:2380"}
{"level":"info","ts":"2023-09-08T16:07:28.286061Z","caller":"rafthttp/pipeline.go:72","msg":"started HTTP pipelining with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.286073Z","caller":"rafthttp/transport.go:286","msg":"added new remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc","remote-peer-urls":["https://172.17.0.6:2380"]}
{"level":"warn","ts":"2023-09-08T16:07:28.28608Z","caller":"rafthttp/http.go:413","msg":"failed to find remote peer in cluster","local-member-id":"b273bc7741bcb020","remote-peer-id-stream-handler":"b273bc7741bcb020","remote-peer-id-from":"b541f27e692498bc","cluster-id":"86482fea2286a1d2"}
{"level":"info","ts":"2023-09-08T16:07:28.288892Z","caller":"rafthttp/pipeline.go:72","msg":"started HTTP pipelining with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.288971Z","caller":"rafthttp/transport.go:286","msg":"added new remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb","remote-peer-urls":["https://172.17.0.2:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.28966Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: b273bc7741bcb020 elected leader f523e6af6b4f95bb at term 4"}
{"level":"warn","ts":"2023-09-08T16:07:28.289994Z","caller":"rafthttp/http.go:413","msg":"failed to find remote peer in cluster","local-member-id":"b273bc7741bcb020","remote-peer-id-stream-handler":"b273bc7741bcb020","remote-peer-id-from":"f523e6af6b4f95bb","cluster-id":"86482fea2286a1d2"}
{"level":"warn","ts":"2023-09-08T16:07:28.296821Z","caller":"rafthttp/http.go:413","msg":"failed to find remote peer in cluster","local-member-id":"b273bc7741bcb020","remote-peer-id-stream-handler":"b273bc7741bcb020","remote-peer-id-from":"f523e6af6b4f95bb","cluster-id":"86482fea2286a1d2"}
{"level":"warn","ts":"2023-09-08T16:07:28.296795Z","caller":"rafthttp/http.go:413","msg":"failed to find remote peer in cluster","local-member-id":"b273bc7741bcb020","remote-peer-id-stream-handler":"b273bc7741bcb020","remote-peer-id-from":"b541f27e692498bc","cluster-id":"86482fea2286a1d2"}
{"level":"info","ts":"2023-09-08T16:07:28.30718Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"b273bc7741bcb020 switched to configuration voters=(12858828581462913056 17664215804571260347)"}
{"level":"info","ts":"2023-09-08T16:07:28.307221Z","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"86482fea2286a1d2","local-member-id":"b273bc7741bcb020","added-peer-id":"f523e6af6b4f95bb","added-peer-peer-urls":["https://172.17.0.2:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.307237Z","caller":"rafthttp/peer.go:133","msg":"starting remote peer","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.307249Z","caller":"rafthttp/pipeline.go:72","msg":"started HTTP pipelining with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.316044Z","caller":"rafthttp/stream.go:169","msg":"started stream writer with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.316178Z","caller":"rafthttp/peer.go:137","msg":"started remote peer","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.316198Z","caller":"rafthttp/transport.go:317","msg":"added remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb","remote-peer-urls":["https://172.17.0.2:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.316553Z","caller":"rafthttp/stream.go:395","msg":"started stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.317501Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"b273bc7741bcb020 switched to configuration voters=(12858828581462913056 13060987019094956220 17664215804571260347)"}
{"level":"info","ts":"2023-09-08T16:07:28.317528Z","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"86482fea2286a1d2","local-member-id":"b273bc7741bcb020","added-peer-id":"b541f27e692498bc","added-peer-peer-urls":["https://172.17.0.6:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.317544Z","caller":"rafthttp/peer.go:133","msg":"starting remote peer","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.317561Z","caller":"rafthttp/pipeline.go:72","msg":"started HTTP pipelining with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.317938Z","caller":"rafthttp/stream.go:169","msg":"started stream writer with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.319158Z","caller":"rafthttp/stream.go:395","msg":"started stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.320743Z","caller":"rafthttp/stream.go:169","msg":"started stream writer with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.322392Z","caller":"rafthttp/stream.go:169","msg":"started stream writer with remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.324165Z","caller":"rafthttp/peer.go:137","msg":"started remote peer","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.324507Z","caller":"rafthttp/transport.go:317","msg":"added remote peer","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc","remote-peer-urls":["https://172.17.0.6:2380"]}
{"level":"info","ts":"2023-09-08T16:07:28.324714Z","caller":"rafthttp/stream.go:395","msg":"started stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.325085Z","caller":"rafthttp/stream.go:395","msg":"started stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.326389Z","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"b273bc7741bcb020","to":"f523e6af6b4f95bb","stream-type":"stream Message"}
{"level":"info","ts":"2023-09-08T16:07:28.32646Z","caller":"rafthttp/peer_status.go:53","msg":"peer became active","peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.326476Z","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"warn","ts":"2023-09-08T16:07:28.326683Z","caller":"rafthttp/http.go:413","msg":"failed to find remote peer in cluster","local-member-id":"b273bc7741bcb020","remote-peer-id-stream-handler":"b273bc7741bcb020","remote-peer-id-from":"b541f27e692498bc","cluster-id":"86482fea2286a1d2"}
{"level":"info","ts":"2023-09-08T16:07:28.334307Z","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"b273bc7741bcb020","to":"f523e6af6b4f95bb","stream-type":"stream MsgApp v2"}
{"level":"info","ts":"2023-09-08T16:07:28.334379Z","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.342945Z","caller":"rafthttp/stream.go:412","msg":"established TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.345628Z","caller":"rafthttp/stream.go:412","msg":"established TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"b273bc7741bcb020","remote-peer-id":"f523e6af6b4f95bb"}
{"level":"info","ts":"2023-09-08T16:07:28.346333Z","caller":"rafthttp/peer_status.go:53","msg":"peer became active","peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.346419Z","caller":"rafthttp/stream.go:412","msg":"established TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.349287Z","caller":"rafthttp/stream.go:412","msg":"established TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.351735Z","caller":"etcdserver/server.go:2062","msg":"published local member to cluster through raft","local-member-id":"b273bc7741bcb020","local-member-attributes":"{Name:kinder-upgrade-control-plane-1 ClientURLs:[https://172.17.0.3:2379]}","request-path":"/0/members/b273bc7741bcb020/attributes","cluster-id":"86482fea2286a1d2","publish-timeout":"7s"}
{"level":"info","ts":"2023-09-08T16:07:28.351782Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2023-09-08T16:07:28.351977Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
{"level":"info","ts":"2023-09-08T16:07:28.351989Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
{"level":"info","ts":"2023-09-08T16:07:28.351795Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2023-09-08T16:07:28.354289Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc+http","address":"172.17.0.3:2379"}
{"level":"info","ts":"2023-09-08T16:07:28.356093Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc+http","address":"127.0.0.1:2379"}
{"level":"info","ts":"2023-09-08T16:07:28.378952Z","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"b273bc7741bcb020","to":"b541f27e692498bc","stream-type":"stream MsgApp v2"}
{"level":"info","ts":"2023-09-08T16:07:28.379035Z","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:07:28.386898Z","caller":"etcdserver/server.go:777","msg":"initialized peer connections; fast-forwarding election ticks","local-member-id":"b273bc7741bcb020","forward-ticks":8,"forward-duration":"800ms","election-ticks":10,"election-timeout":"1s","active-remote-members":2}
{"level":"info","ts":"2023-09-08T16:07:28.411722Z","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"b273bc7741bcb020","to":"b541f27e692498bc","stream-type":"stream Message"}
{"level":"info","ts":"2023-09-08T16:07:28.411804Z","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"b273bc7741bcb020","remote-peer-id":"b541f27e692498bc"}
{"level":"info","ts":"2023-09-08T16:08:54.68072Z","caller":"traceutil/trace.go:171","msg":"trace[1722954285] linearizableReadLoop","detail":"{readStateIndex:1489; appliedIndex:1489; }","duration":"106.106753ms","start":"2023-09-08T16:08:54.574602Z","end":"2023-09-08T16:08:54.680709Z","steps":["trace[1722954285] 'read index received'  (duration: 106.104298ms)","trace[1722954285] 'applied index is now lower than readState.Index'  (duration: 1.846µs)"],"step_count":2}
{"level":"warn","ts":"2023-09-08T16:08:54.681205Z","caller":"etcdserver/util.go:170","msg":"apply request took too long","took":"106.600295ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/roles/\" range_end:\"/registry/roles0\" count_only:true ","response":"range_response_count:0 size:7"}
{"level":"info","ts":"2023-09-08T16:08:54.681234Z","caller":"traceutil/trace.go:171","msg":"trace[245520865] range","detail":"{range_begin:/registry/roles/; range_end:/registry/roles0; response_count:0; response_revision:1295; }","duration":"106.641776ms","start":"2023-09-08T16:08:54.574586Z","end":"2023-09-08T16:08:54.681228Z","steps":["trace[245520865] 'agreement among raft nodes before linearized reading'  (duration: 106.579299ms)"],"step_count":1}

@neolit123
Copy link
Member

hmm, so the etcd pod actually changed:

@@ -41,11 +41,10 @@ spec:
         path: /health?exclude=NOSPACE&serializable=true
         port: 2381
         scheme: HTTP
       initialDelaySeconds: 10
       periodSeconds: 10
-      successThreshold: 1
       timeoutSeconds: 15
     name: etcd
     resources:
       requests:
         cpu: 100m
@@ -57,30 +56,22 @@ spec:
         path: /health?serializable=false
         port: 2381
         scheme: HTTP
       initialDelaySeconds: 10
       periodSeconds: 10
-      successThreshold: 1
       timeoutSeconds: 15
-    terminationMessagePath: /dev/termination-log
-    terminationMessagePolicy: File
     volumeMounts:
     - mountPath: /var/lib/etcd
       name: etcd-data
     - mountPath: /etc/kubernetes/pki/etcd
       name: etcd-certs
-  dnsPolicy: ClusterFirst
-  enableServiceLinks: true
   hostNetwork: true
   priority: 2000001000
   priorityClassName: system-node-critical
-  restartPolicy: Always
-  schedulerName: default-scheduler
   securityContext:
     seccompProfile:
       type: RuntimeDefault
-  terminationGracePeriodSeconds: 30
   volumes:
   - hostPath:
       path: /etc/kubernetes/pki/etcd
       type: DirectoryOrCreate
     name: etcd-certs

did sig node stop applying some defaults?
or maybe there is a behavior change....

@neolit123
Copy link
Member

neolit123 commented Sep 8, 2023

but i think the hash change is failing because the etcd static pod annotation kubernetes.io/config.hash never changes (missing)..that's odd.

https://github.com/kubernetes/kubernetes/blob/d7aeb7f853e822071041bdbada7fcbe09d15062d/cmd/kubeadm/app/util/apiclient/wait.go#L230
https://github.com/kubernetes/kubernetes/blob/d7aeb7f853e822071041bdbada7fcbe09d15062d/pkg/kubelet/types/pod_update.go#L32
https://github.com/kubernetes/kubernetes/blob/d7aeb7f853e822071041bdbada7fcbe09d15062d/pkg/kubelet/config/common.go#L87

i think we should look at recent kubelet changes.

the stranger part is that the kube-apiserver, scheduler and kcm also use the same "check hash" during kubeadm upgrade, but it works for them.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 8, 2023

We may need to confirm if this is a kubelet behavior change or if this is a pod yaml generation behavior change with default values added.

I had some problem running the e2e locally yesterday.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 9, 2023

I checked with lastest kubelet that if the manifest changes(like adding a default spec attribute: dnsPolicy: ClusterFirst).

The container will be recreated. But the hash kubernetes.io/config.hash: 9a96c413b99485c6398b3b41a6a04554 will not change. Behavior of kubelet is the same for kubelet v1.28 and kubelet built with latest code.

@neolit123
Copy link
Member

if the etcd version, flags, and thease defaults don't change, the hash will remain the same. i am trying to understand why is the logic even trying to apply this upgrade. technically we shouldn't reach this point.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 9, 2023

if the etcd version, flags, and thease defaults don't change, the hash will remain the same. i am trying to understand why is the logic even trying to apply this upgrade. technically we shouldn't reach this point.

I prefer to add a logic in kubeadm to add those default values.

The diff is something like

    -\u00a0\t\tDNSPolicy:                     \"\",\n
    +\u00a0\t\tDNSPolicy:                     \"ClusterFirst\",

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

I prefer to add a logic in kubeadm to add those default values.

we could, if we want to hardcode certain values and be opinionated about certain behavior.
but we tend to not do that. if k8s component Foo wants to change its defaults we just start using them.

The diff is something like:
-\u00a0\t\tDNSPolicy: "",\n
+\u00a0\t\tDNSPolicy: "ClusterFirst",

you mean this PR:
kubernetes/kubernetes#120541

confirms that our logic in kubeadm detects a diff and that's why it proceeds with the etcd.yaml upgrade?

ClusterFirst is not a default value:
https://pkg.go.dev/k8s.io/api/core/v1#DNSPolicy
"Default" is the default value which the kubelet will apply if it sees a field value of "".

so to me it seems ClusterFirst is now the kubelet default for some reason?

we don't apply DNS policy to static pods anywhere in kubeadm:

~/go/src/k8s.io/kubernetes/cmd/kubeadm$ grep DNSPolicy * -rnI
~/go/src/k8s.io/kubernetes/cmd/kubeadm$ grep dnsPolicy * -rnI
app/phases/addons/dns/dns_test.go:747:      dnsPolicy: Default
app/phases/addons/dns/dns_test.go:1013:      dnsPolicy: Default
app/phases/addons/dns/manifests.go:146:      dnsPolicy: Default
~/go/src/k8s.io/kubernetes/cmd/kubeadm$ grep ClusterFirst * -rnI
~/go/src/k8s.io/kubernetes/cmd/kubeadm$ 

EDITED

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

ClusterFirst is not a default value:
https://pkg.go.dev/k8s.io/api/core/v1#DNSPolicy
"Default" is the default value which the kubelet will apply if it sees a field value of "".

never mind, looks like it has been ClusterFIrst for a long time:

https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/v1/defaults.go#L221-L226

-\u00a0\t\tDNSPolicy:                     \"\",\n
+\u00a0\t\tDNSPolicy:                     \"ClusterFirst\",

i'm confused where this defaulting comes from in 1.29-pre...

kubeadm does not consume the private types and their defaulting.
it only consumes the public v1.Pod, v1.PodSpec, etc:

https://github.com/kubernetes/api/blob/master/core/v1/types.go#L3369-L3376

without defaulting. i am confused where these default v1.Pod values are coming from.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 9, 2023

To clarify the problem

  1. kubeadm generated a new etcd manifest: adding some new defaulting values due to client reason
  2. compare the hash of pod (unmarshall from the file): get a diff result
  3. kubelet will not change the hash key if the adding attributes are with defaulting values.

I prefer to think kubelet works as expected here.

The second is also work as expected as kubeadm always don't apply those defaulting values to the manifest YAML before.

So now, it comes to the first. Why those defaulting values are added to the etcd manifests?

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

understood,

kubeadm generated a new etcd manifest: adding some new defaulting values due to client reason

did you find a client reason?

technically to check if two static pod manifests are different we read them both from disk:
https://github.com/kubernetes/kubernetes/blob/21f7bf66fa949dda2b3bec6e3581e248e270e001/cmd/kubeadm/app/phases/upgrade/staticpods.go#L217
https://github.com/kubernetes/kubernetes/blob/21f7bf66fa949dda2b3bec6e3581e248e270e001/cmd/kubeadm/app/util/staticpod/utils.go#L354C1-L372

i.e. we don't fetch the Pod from the API server (as a mirror pod of static pod).

so if the new static pod has the defaulting kubeadm must be doing it, somehow...which is strange.
the 1.28 and 1.29 generation of the etcd static pod manifest should be the same:
https://github.com/kubernetes/kubernetes/blob/21f7bf66fa949dda2b3bec6e3581e248e270e001/cmd/kubeadm/app/phases/etcd/local.go#L198-L233

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

we do use the apimachinery codecs for v1.Pod* though:

read yaml file -> pass it to codec -> decode to know object (go struct) - e.g. v1.Pod

https://github.com/kubernetes/kubernetes/blob/21f7bf66fa949dda2b3bec6e3581e248e270e001/cmd/kubeadm/app/util/staticpod/utils.go#L234

https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/marshal.go#L58-L78

so if the decoder now has decided to apply defaults, we have a problem.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 9, 2023

I posted this to #api-machinery slack channel https://kubernetes.slack.com/archives/C0EG7JC6T/p1694264129815039. Let me do some generating test locally.

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

reproduced it minimally, will post it on slack:

package main

import (
	"testing"

	"github.com/pkg/errors"
	v1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/runtime/schema"
	"k8s.io/apimachinery/pkg/runtime/serializer"
	clientsetscheme "k8s.io/client-go/kubernetes/scheme"
	"sigs.k8s.io/yaml"
)

var input = []byte(string(`
apiVersion: v1
kind: Pod
metadata:
  name: foo
`))

func unmarshalFromYamlForCodecs(buffer []byte, gv schema.GroupVersion, codecs serializer.CodecFactory) (runtime.Object, error) {
	const mediaType = runtime.ContentTypeYAML
	info, ok := runtime.SerializerInfoForMediaType(codecs.SupportedMediaTypes(), mediaType)
	if !ok {
		return nil, errors.Errorf("unsupported media type %q", mediaType)
	}

	decoder := codecs.DecoderToVersion(info.Serializer, gv)
	obj, err := runtime.Decode(decoder, buffer)
	if err != nil {
		return nil, errors.Wrapf(err, "failed to decode %s into runtime.Object", buffer)
	}
	return obj, nil
}

func TestFunc(t *testing.T) {
	t.Logf("\ninput:\n%s\n", input)
	obj, err := unmarshalFromYamlForCodecs(input, v1.SchemeGroupVersion, clientsetscheme.Codecs)
	if err != nil {
		t.Fatalf("error: %v\n", err)
	}
	pod := obj.(*v1.Pod)
	// t.Logf("%+v\n", pod)

	output, err := yaml.Marshal(pod)
	if err != nil {
		t.Fatalf("error: %v\n", err)
	}
	t.Logf("\noutput:\n\n%s\n", output)
}

commit fd8f2c7fc65b467b1856dc9a83c06d12bd92e586 (HEAD -> master, origin/master, origin/HEAD)

=== RUN   TestFunc
   ~/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm_test.go:54:
        input:

        apiVersion: v1
        kind: Pod
        metadata:
          name: foo

   ~/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm_test.go:66:
        output:

        apiVersion: v1
        kind: Pod
        metadata:
          creationTimestamp: null
          name: foo
        spec:
          containers: null
          dnsPolicy: ClusterFirst
          enableServiceLinks: true
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
        status: {}

--- PASS: TestFunc (0.01s)
PASS
ok  	k8s.io/kubernetes/cmd/kubeadm	0.076s

commit 3b874af3878ec7769af3c37afba22fc4d232e57e (HEAD -> release-1.27)

Running tool: /usr/local/go/bin/go test -timeout 30s -run ^TestFunc$ k8s.io/kubernetes/cmd/kubeadm -count=1 -v

=== RUN   TestFunc
  ~/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm_test.go:38:
        input:

        apiVersion: v1
        kind: Pod
        metadata:
          name: foo

    ~/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm_test.go:51:
        output:

        apiVersion: v1
        kind: Pod
        metadata:
          creationTimestamp: null
          name: foo
        spec:
          containers: null
        status: {}

--- PASS: TestFunc (0.01s)
PASS
ok  	k8s.io/kubernetes/cmd/kubeadm	0.072s

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

so above i have a mistake. it changed after 1.27 not after 1.28.
i'm trying to bisect when it happened.

EDIT: happened between tags v1.28.0-alpha.3 and v1.28.0-alpha.4

but there is something else going on...
the strange part is that we have green runs when this v1.Pod defaulting is already changed:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-addons-before-controlplane-1-28-latest
e.g. all runs from 08-28 are green but the output of my test code is already with the defaults applied.

i have no idea why our hash function is returning a matching manifest in those cases.
we need to find out why...

@neolit123
Copy link
Member

neolit123 commented Sep 9, 2023

found a commit which according to git bisect is the problem one.
i.e. a commit that first ends up generating the defaults like dnsPolicy: ClusterFirst

here are the test files:

output:

$ ./bisect_run.sh 
Previous HEAD position was 2e93c65eff5 Merge pull request #114849 from padlar/handle-terminating-namespaces
HEAD is now at d02d8ba635c Merge pull request #118862 from byako/batching-dra-calls
test: fail
Previous HEAD position was d02d8ba635c Merge pull request #118862 from byako/batching-dra-calls
HEAD is now at 2e93c65eff5 Merge pull request #114849 from padlar/handle-terminating-namespaces
test: ok
Bisecting: 169 revisions left to test after this (roughly 7 steps)
[d53762ec3a5490b5a0e9af2860bbfd0aeed4d025] remove unnecessary comment in pkg/scheduler/framework.QueueingHintFn
running ./test.sh
test: ok
Bisecting: 84 revisions left to test after this (roughly 6 steps)
[00b8a0a95bb942ad5bfc1bc338452cb983746030] Merge pull request #118160 from minherz/master
running ./test.sh
test: fail
Bisecting: 41 revisions left to test after this (roughly 5 steps)
[14b08b82e4f51f34191daf42e00be4f885c8f3d9] Merge pull request #118941 from neolit123/1.28-restore-test-function-for-config-migrate
running ./test.sh
test: ok
Bisecting: 20 revisions left to test after this (roughly 4 steps)
[8c33d3ef7b2f099c7bb81f340f332dbf3a959548] Merge pull request #118967 from tukwila/upgrade_npd_image_v0.8.13
running ./test.sh
test: ok
Bisecting: 11 revisions left to test after this (roughly 3 steps)
[229dd79efde26caba43feb376d0e81e705d38220] Merge pull request #117865 from aleksandra-malinowska/parallel-sts-3
running ./test.sh
test: ok
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[3be105e7eb6cea52eb403675a332781013d07755] Merge pull request #118905 from stormqueen1990/update-translations-pot
running ./test.sh
test: ok
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[dbdd861ea366af50fb74983426587dad7222cb89] chore: address review feedback
running ./test.sh
test: ok
Bisecting: 1 revision left to test after this (roughly 1 step)
[c68a6b07456301f6a8102c01e99ba8a53bc9bf4f] kubeadm: fix fuzzing for ExtraEnvs defined in the control plane component
running ./test.sh
test: ok
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[746b88c6ff442185589640885228ef100811083a] Merge pull request #118867 from chendave/env_custom
running ./test.sh
test: fail
746b88c6ff442185589640885228ef100811083a is the first bad commit
commit 746b88c6ff442185589640885228ef100811083a
Merge: 3be105e7eb6 c68a6b07456
Author: Kubernetes Prow Robot <[email protected]>
Date:   Mon Jul 3 22:02:52 2023 -0700

    Merge pull request #118867 from chendave/env_custom
    
    kubeadm: Support custom env in control plane component

 cmd/kubeadm/app/apis/kubeadm/fuzzer/fuzzer.go      |  4 ++
 cmd/kubeadm/app/apis/kubeadm/types.go              | 10 +++
 cmd/kubeadm/app/apis/kubeadm/v1beta3/conversion.go | 21 ++++++
 .../kubeadm/v1beta3/zz_generated.conversion.go     | 82 +++++++++++-----------
 cmd/kubeadm/app/apis/kubeadm/v1beta4/doc.go        |  7 +-
 cmd/kubeadm/app/apis/kubeadm/v1beta4/types.go      | 10 +++
 .../kubeadm/v1beta4/zz_generated.conversion.go     |  4 ++
 .../apis/kubeadm/v1beta4/zz_generated.deepcopy.go  | 14 ++++
 .../apis/kubeadm/v1beta4/zz_generated.defaults.go  | 35 +++++++++
 .../app/apis/kubeadm/zz_generated.deepcopy.go      | 14 ++++
 cmd/kubeadm/app/cmd/upgrade/diff.go                |  2 +-
 cmd/kubeadm/app/phases/controlplane/manifests.go   | 13 ++--
 .../app/phases/controlplane/manifests_test.go      | 15 +++-
 cmd/kubeadm/app/phases/etcd/local.go               |  1 +
 cmd/kubeadm/app/phases/etcd/local_test.go          |  8 +++
 cmd/kubeadm/app/util/env.go                        | 15 ++++
 cmd/kubeadm/app/util/env_test.go                   | 76 ++++++++++++++++++++
 17 files changed, 278 insertions(+), 53 deletions(-)
 create mode 100644 cmd/kubeadm/app/util/env_test.go
bisect run success
Previous HEAD position was 746b88c6ff4 Merge pull request #118867 from chendave/env_custom
HEAD is now at d02d8ba635c Merge pull request #118862 from byako/batching-dra-calls
reseting...
We are not bisecting.

test.sh

go test -timeout 30s -run ^TestFunc$ k8s.io/kubernetes/cmd/kubeadm -count=1 2>&1 1>/dev/null
if [ $? -eq 0 ]
then
    echo "test: ok"
    exit 0
else
    echo "test: fail"
    exit 1
fi

bisect_run.sh

# found manually the alpha.4 and alpha.3
KNOWN_BAD_COMMIT=v1.28.0-alpha.4
KNOWN_GOOD_COMMIT=v1.28.0-alpha.3

trap cleanup EXIT
cleanup()
{
    echo "reseting..."
    git bisect reset
}

git checkout $KNOWN_BAD_COMMIT
git bisect start
./test.sh
if [ $? -ne 1 ]
then
    echo "error: KNOWN_BAD_COMMIT must be a bad commit!"
    exit 1
fi
git bisect bad

git checkout $KNOWN_GOOD_COMMIT
./test.sh
if [ $? -ne 0 ]
then
    echo echo "error: KNOWN_GOOD_COMMIT= must be a good commit!"
    exit 1
fi
git bisect good

git bisect run ./test.sh
git bisect reset

cmd/kubeadm/kubeadm_test.go

package main

import (
	"testing"

	"github.com/pkg/errors"
	v1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/runtime/schema"
	"k8s.io/apimachinery/pkg/runtime/serializer"
	clientsetscheme "k8s.io/client-go/kubernetes/scheme"
)

var input = []byte(string(`
apiVersion: v1
kind: Pod
metadata:
  name: foo
`))

func unmarshalFromYamlForCodecs(buffer []byte, gv schema.GroupVersion, codecs serializer.CodecFactory) (runtime.Object, error) {
	const mediaType = runtime.ContentTypeYAML
	info, ok := runtime.SerializerInfoForMediaType(codecs.SupportedMediaTypes(), mediaType)
	if !ok {
		return nil, errors.Errorf("unsupported media type %q", mediaType)
	}

	decoder := codecs.DecoderToVersion(info.Serializer, gv)
	obj, err := runtime.Decode(decoder, buffer)
	if err != nil {
		return nil, errors.Wrapf(err, "failed to decode %s into runtime.Object", buffer)
	}
	return obj, nil
}

func TestFunc(t *testing.T) {
	obj, err := unmarshalFromYamlForCodecs(input, v1.SchemeGroupVersion, clientsetscheme.Codecs)
	if err != nil {
		t.Fatalf("error: %v\n", err)
	}

	pod := obj.(*v1.Pod)
	if pod.Spec.DNSPolicy != "" { // simply check if one of the values we care about was defaulted from ""
		t.Fatal("error")
	}
}

@neolit123
Copy link
Member

kubernetes/kubernetes@746b88c
kubernetes/kubernetes#118867

according to git bisect it's @chendave 's PR that changed this behavior...
this is very weird, it must be a mistake, right? because i don't see how the defaulting of the etcd static pod can change because of that PR. the flakes at testgrid can indicate some sort of non-determinism but i don't see that in the PR.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 11, 2023

2. in our function to reading a Pod manifest we should switch to using codecs.UniversalDeserializer().Decode(...)
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/marshal.go#L57-L78
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/runtime/serializer/codec_factory.go#L269
TODO

I added the change of point 2: using the universal deserializer to decode in kubernetes/kubernetes#120549.

@neolit123
Copy link
Member

neolit123 commented Sep 12, 2023

so we are preparing some patches for 1. and 2. (noted above) and we are going to apply them to master and then backport 1. to 1.28.

but our upgrade CI is only failing from 1.28 -> master upgrades. the master upgrade is using the 1.29-pre kubeadm binary.
this makes the whole problem very confusing.

logically it should have also failed in the 1.27 -> 1.28 upgrades, because 1.28 is when we added the problem code (extra envs).
there could be other factors at play here and we might not understand why this is happening exactly...

@neolit123
Copy link
Member

neolit123 commented Sep 12, 2023

so we are preparing some patches for 1. and 2. (noted above) and we are going to apply them to master and then backport 1. to 1.28.

but our upgrade CI is only failing from 1.28 -> master upgrades. the master upgrade is using the 1.29-pre kubeadm binary. this makes the whole problem very confusing.

logically it should have also failed in the 1.27 -> 1.28 upgrades, because 1.28 is when we added the problem code (extra envs). there could be other factors at play here and we might not understand why this is happening exactly...

1 and 2 merged.
we also updated the diff tool.
https://github.com/kubernetes/kubernetes/commits/master

the e2e tests are still failing:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1701661172480086016/build-log.txt

this is before the diff tool update, but we are still getting the strange defaulter diff.
our deserializer is now not a converting one and we removed the dep on the internal packages...

[upgrade/staticpods] Renewing etcd-server certificate
I0912 18:26:16.593921    2794 staticpods.go:225] Pod manifest files diff:
  &v1.Pod{
  	TypeMeta:   {Kind: "Pod", APIVersion: "v1"},
  	ObjectMeta: {Name: "etcd", Namespace: "kube-system", Labels: {"component": "etcd", "tier": "control-plane"}, Annotations: {"kubeadm.kubernetes.io/etcd.advertise-client-urls": "https://172.17.0.3:2379"}, ...},
  	Spec: v1.PodSpec{
  		Volumes:        {{Name: "etcd-certs", VolumeSource: {HostPath: &{Path: "/etc/kubernetes/pki/etcd", Type: &"DirectoryOrCreate"}}}, {Name: "etcd-data", VolumeSource: {HostPath: &{Path: "/var/lib/etcd", Type: &"DirectoryOrCreate"}}}},
  		InitContainers: nil,
  		Containers: []v1.Container{
  			{
  				... // 11 identical fields
  				VolumeMounts:  {{Name: "etcd-data", MountPath: "/var/lib/etcd"}, {Name: "etcd-certs", MountPath: "/etc/kubernetes/pki/etcd"}},
  				VolumeDevices: nil,
  				LivenessProbe: &v1.Probe{
  					... // 2 identical fields
  					TimeoutSeconds:                15,
  					PeriodSeconds:                 10,
- 					SuccessThreshold:              0,
+ 					SuccessThreshold:              1,
  					FailureThreshold:              8,
  					TerminationGracePeriodSeconds: nil,
  				},
  				ReadinessProbe: nil,
  				StartupProbe: &v1.Probe{
  					... // 2 identical fields
  					TimeoutSeconds:                15,
  					PeriodSeconds:                 10,
- 					SuccessThreshold:              0,
+ 					SuccessThreshold:              1,
  					FailureThreshold:              24,
  					TerminationGracePeriodSeconds: nil,
  				},
  				Lifecycle:                nil,
- 				TerminationMessagePath:   "",
+ 				TerminationMessagePath:   "/dev/termination-log",
- 				TerminationMessagePolicy: "",
+ 				TerminationMessagePolicy: "File",
  				ImagePullPolicy:          "IfNotPresent",
  				SecurityContext:          nil,
  				... // 3 identical fields
  			},
  		},
  		EphemeralContainers:           nil,
- 		RestartPolicy:                 "",
+ 		RestartPolicy:                 "Always",
- 		TerminationGracePeriodSeconds: nil,
+ 		TerminationGracePeriodSeconds: &30,
  		ActiveDeadlineSeconds:         nil,
- 		DNSPolicy:                     "",
+ 		DNSPolicy:                     "ClusterFirst",
  		NodeSelector:                  nil,
  		ServiceAccountName:            "",
  		... // 10 identical fields
  		Subdomain:     "",
  		Affinity:      nil,
- 		SchedulerName: "",
+ 		SchedulerName: "default-scheduler",
  		Tolerations:   nil,
  		HostAliases:   nil,
  		... // 3 identical fields
  		ReadinessGates:     nil,
  		RuntimeClassName:   nil,
- 		EnableServiceLinks: nil,
+ 		EnableServiceLinks: &true,
  		PreemptionPolicy:   nil,
  		Overhead:           nil,
  		... // 6 identical fields
  	},
  	Status: {},
  }

I0912 18:26:16.594089    2794 certs.go:519] validating certificate period for etcd CA certificate
I0912 18:26:16.594901    2794 certs.go:519] validating certificate period for etcd/ca certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2023-09-12-18-26-15/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
I0912 18:26:18.742230    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 5 milliseconds
I0912 18:26:19.251245    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 6 milliseconds
I0912 18:26:19.750397    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 4 milliseconds
I0912 18:26:20.249612    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 4 milliseconds
....
...
I0912 18:31:18.249358    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 3 milliseconds
I0912 18:31:18.751030    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 6 milliseconds
I0912 18:31:18.755776    2794 round_trippers.go:553] GET https://172.17.0.7:6443/api/v1/namespaces/kube-system/pods/etcd-kinder-upgrade-control-plane-1?timeout=10s 200 OK in 3 milliseconds
I0912 18:31:18.756932    2794 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.4:2379 https://172.17.0.3:2379 https://172.17.0.2:2379]) are available 1/10
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
	cmd/kubeadm/app/phases/upgrade/staticpods.go:527
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
	cmd/kubeadm/app/phases/upgrade/staticpods.go:256
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:340
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
	cmd/kubeadm/app/phases/upgrade/staticpods.go:467
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:619
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
	cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:267
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:369
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.StaticPodControlPlane
	cmd/kubeadm/app/phases/upgrade/staticpods.go:467
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.PerformStaticPodUpgrade
	cmd/kubeadm/app/phases/upgrade/staticpods.go:619
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.PerformControlPlaneUpgrade
	cmd/kubeadm/app/cmd/upgrade/apply.go:216
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:156
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:267
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
[upgrade/apply] FATAL
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
	cmd/kubeadm/app/cmd/upgrade/apply.go:157
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.newCmdApply.func1
	cmd/kubeadm/app/cmd/upgrade/apply.go:74
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:267
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
Error: failed to exec action kubeadm-upgrade: exit status 1

@pacoxu
Copy link
Member Author

pacoxu commented Sep 12, 2023

so weird😓

@SataQiu
Copy link
Member

SataQiu commented Sep 13, 2023

We can verify and cherry-pick patches to 1.28.

@SataQiu
Copy link
Member

SataQiu commented Sep 13, 2023

The latest diff result(https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1701751770478284800/build-log.txt) shows that the default values are no longer injected, this is what we expect.

I0913 00:25:24.641339    3894 staticpods.go:225] Pod manifest files diff:
@@ -46 +45,0 @@
-      successThreshold: 1
@@ -62 +60,0 @@
-      successThreshold: 1
@@ -64,2 +61,0 @@
-    terminationMessagePath: /dev/termination-log
-    terminationMessagePolicy: File
@@ -71,2 +66,0 @@
-  dnsPolicy: ClusterFirst
-  enableServiceLinks: true
@@ -76,2 +69,0 @@
-  restartPolicy: Always
-  schedulerName: default-scheduler
@@ -81 +72,0 @@
-  terminationGracePeriodSeconds: 30

Now we should fix the v1.28 version, and then the CI will be green.
Waiting kubernetes/kubernetes#120605 to be merged.

I think after we remove the reference of k8s.io/kubernetes/pkg/apis/core/v1 from the v1.28 branch, everything will be back to normal.
Because the Pod defaulter is registered into Scheme by:
https://github.com/kubernetes/kubernetes/blob/160fe010f32fd1896917fecad680769ad0e40ca0/pkg/apis/core/v1/register.go#L29-L34

func init() {
        // We only register manually written functions here. The registration of the
        // generated functions takes place in the generated files. The separation
        // makes the code compile even when the generated files are missing.
        localSchemeBuilder.Register(addDefaultingFuncs, addConversionFuncs)
}

That's why the default values are injected...

@neolit123
Copy link
Member

neolit123 commented Sep 13, 2023

kubernetes/kubernetes#120605
is open. i will ping some folks on slack to try to merge it faster.

EDIT: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1694575171227399

@neolit123
Copy link
Member

neolit123 commented Sep 13, 2023

but if currently, the 1.28 kubeadm binary is producing a different manifest during upgrade due to the internal defaulters, that would mean the 1.27->1.28 upgrade must be failing as well? instead, it's currently green.

edit: etcd version is the same:
https://github.com/kubernetes/kubernetes/blob/d8e9fb8b7f244536325100e332faefbae01cfd7b/cmd/kubeadm/app/constants/constants.go#L462

edit2: but an etcd upgrade is performed/ successful:

[upgrade/staticpods] Component "etcd" upgraded successfully!

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-27-1-28/1701746737351233536/build-log.txt

@chendave
Copy link
Member

some testing from me confirms this is really go version related.
defaults will always generated with golang 1.20, so that whatever the k8s.io/kubernetes/pkg/apis/core/v1 is imported or not, diff will always empty as both of them are generated with defaults.
defaults will not be generated by golang 1.21 by default, and I suspect kind update the golang recently, so this issue is hit.

shows that the default values are no longer injected, this is what we expect.

some testing in kinder shows the new manifest will not have the defaults generated, but old manifest for each of the pod has the defaults created.

@neolit123
Copy link
Member

neolit123 commented Sep 13, 2023

but if currently, the 1.28 kubeadm binary is producing a different manifest during upgrade due to the internal defaulters, that would mean the 1.27->1.28 upgrade must be failing as well? instead, it's currently green.

just tested with kinder this workflow and there is a diff:

--- etcd.27.yaml	2023-09-13 18:03:01.839883253 +0300
+++ etcd.28.yaml	2023-09-13 17:55:56.203458105 +0300
@@ -32,7 +32,7 @@
     - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
     - --snapshot-count=10000
     - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
-    image: registry.k8s.io/etcd:3.5.7-0
+    image: registry.k8s.io/etcd:3.5.9-0
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8

at the time of testing kubeadm 1.27 still has the 3.5.7 etcd:
https://github.com/kubernetes/kubernetes/blob/release-1.27/cmd/kubeadm/app/constants/constants.go#L486

there is a pending backport for 3.5.9:
kubernetes/kubernetes#118079

but both the .27 and .28 etcd manifests do not have the defaults!

~/go/src/k8s.io/kubeadm/kinder$ cat etcd.28.yaml 
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.2:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.2:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.17.0.2:2380
    - --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.2:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.2:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.17.0.2:2380
    - --name=kinder-upgrade-control-plane-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

unclear to me if kubernetes/kubernetes#120605 will fix anything.

@neolit123
Copy link
Member

neolit123 commented Sep 13, 2023

some testing from me confirms this is really go version related. defaults will always generated with golang 1.20, so that whatever the k8s.io/kubernetes/pkg/apis/core/v1 is imported or not, diff will always empty as both of them are generated with defaults. defaults will not be generated by golang 1.21 by default, and I suspect kind update the golang recently, so this issue is hit.

shows that the default values are no longer injected, this is what we expect.

some testing in kinder shows the new manifest will not have the defaults generated, but old manifest for each of the pod has the defaults created.

i saw go version diff as well at some point. go 1.21 was OK, go 1.20 generated defaults for the TestFunc in this ticket.

but kubeadm at the 1.28 branch is still built with 1.20:

$ docker exec kinder-regular-control-plane-1 kinder/upgrade/v1.28.1-59+d8e9fb8b7f2445/kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.1-59+d8e9fb8b7f2445", GitCommit:"d8e9fb8b7f244536325100e332faefbae01cfd7b", GitTreeState:"clean", BuildDate:"2023-09-08T19:46:29Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}

and it's not generating defaults as i showed above.
so somehow i think the bugs are still at k/k master for:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest
(i.e. latest)

@pacoxu
Copy link
Member Author

pacoxu commented Sep 13, 2023

kubernetes/release#3076 is in progress.

@pacoxu pacoxu removed their assignment Sep 14, 2023
@SataQiu
Copy link
Member

SataQiu commented Sep 14, 2023

The default values will only be injected when kubeadm init is performed.
This is why the v1.27->1.28 upgrade works but v1.28 -> latest upgrade fails.

@neolit123
Copy link
Member

The default values are injected only when kubeadm init is performed. This is why the v1.27->1.28 upgrade works but v1.28 -> latest upgrade fails.

hmm, how so?
aren't both init and upgrade using the same "create manifest" logic?

@pacoxu
Copy link
Member Author

pacoxu commented Sep 14, 2023

The default values will be injected when v1.28 kubeadm init is performed.

in Kinder? IIRC, the default will be not injected when I init my cluster on Centos directly.

@SataQiu
Copy link
Member

SataQiu commented Sep 14, 2023

in Kinder? IIRC, the default will be not injected when I init my cluster on Centos directly.

You can try the old v1.28 version:

wget https://storage.googleapis.com/k8s-release-dev/ci/v1.28.2-1+a68748c7cd04f2/bin/linux/amd64/kubeadm
chmod +x kubeadm
kubeadm init ...

BTW, kubernetes/kubernetes#120605 is merged.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 14, 2023

@pacoxu
Copy link
Member Author

pacoxu commented Sep 14, 2023

@k8s-ci-robot
Copy link
Contributor

@pacoxu: Closing this issue.

In response to this:

/close

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1702296617098416128
failed for a recent flake

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

neolit123 commented Sep 14, 2023

i don't see defaults after @chendave 's cherry pick merged:
kubernetes/kubernetes@728862d

it's 728862ded5e9d8, found in the kinder output below:

$ kinder build node-image-variant --base-image=kindest/base:v20221102-76f15095 --image=kindest/node:test --with-init-artifacts=v1.28.2-7+728862ded5e9d8 --loglevel=debug && kinder create cluster --name=kinder-regular --image=kindest/node:test --control-plane-nodes=1 --loglevel=debug && kinder do kubeadm-init --name=kinder-regular --loglevel=debug --kubeadm-verbosity=6

$ docker exec kinder-regular-control-plane-1 cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.2:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.2:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.17.0.2:2380
    - --initial-cluster=kinder-regular-control-plane-1=https://172.17.0.2:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.2:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.17.0.2:2380
    - --name=kinder-regular-control-plane-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

$ docker exec kinder-regular-control-plane-1 kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.2-7+728862ded5e9d8", GitCommit:"728862ded5e9d8fc3db1555499a66c5569ad8db6", GitTreeState:"clean", BuildDate:"2023-09-14T09:48:17Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}

You can try the old v1.28 version:

but i did not see defaults with the old 1.28 binary too:

$ ./kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.2-1+a68748c7cd04f2", GitCommit:"a68748c7cd04f2462352afb05ba31f06fc799595", GitTreeState:"clean", BuildDate:"2023-09-13T09:54:55Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}

$ sudo ./kubeadm init phase etcd local
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

$ sudo cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.0.2.15:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://10.0.2.15:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://10.0.2.15:2380
    - --initial-cluster=lubo-it=https://10.0.2.15:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://10.0.2.15:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://10.0.2.15:2380
    - --name=lubo-it
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

@SataQiu
Copy link
Member

SataQiu commented Sep 14, 2023

but i did not see defaults with the old 1.28 binary too:

@neolit123 @pacoxu You can try to reproduce it by:

docker pull kindest/base:v20221102-76f15095
           
kinder build node-image-variant --base-image=kindest/base:v20221102-76f15095 --image=kindest/node:test --with-init-artifacts=v1.28.2-1+a68748c7cd04f2 --with-upgrade-artifacts=v1.29.0-alpha.0.802+a68093a3ffb552 --loglevel=debug

kinder create cluster --name=kinder-upgrade --image=kindest/node:test --control-plane-nodes=1 --worker-nodes=1 --loglevel=debug
kinder do kubeadm-init --name=kinder-upgrade --copy-certs=auto --loglevel=debug --kubeadm-verbosity=6
echo "---------------------------------------------"
echo "----Old v1.28 kubeadm generated etcd.yaml----"
echo "---------------------------------------------"
docker exec kinder-upgrade-control-plane-1  cat /etc/kubernetes/manifests/etcd.yaml
kinder delete cluster --name=kinder-upgrade

kinder build node-image-variant --base-image=kindest/base:v20221102-76f15095 --image=kindest/node:test --with-init-artifacts=v1.28.2-7+728862ded5e9d8 --with-upgrade-artifacts=v1.29.0-alpha.0.806+4abf29c5c86349 --loglevel=debug

kinder create cluster --name=kinder-upgrade --image=kindest/node:test --control-plane-nodes=1 --worker-nodes=1 --loglevel=debug
kinder do kubeadm-init --name=kinder-upgrade --copy-certs=auto --loglevel=debug --kubeadm-verbosity=6
echo "---------------------------------------------"
echo "----New v1.28 kubeadm generated etcd.yaml----"
echo "---------------------------------------------"
docker exec kinder-upgrade-control-plane-1  cat /etc/kubernetes/manifests/etcd.yaml
kinder delete cluster --name=kinder-upgrade           

The output is as follows:

...
---------------------------------------------
----Old v1.28 kubeadm generated etcd.yaml----
---------------------------------------------
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.3:2379/
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.3:2379/
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.17.0.3:2380/
    - --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.3:2380/
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379/,https://172.17.0.3:2379/
    - --listen-metrics-urls=http://127.0.0.1:2381/
    - --listen-peer-urls=https://172.17.0.3:2380/
    - --name=kinder-upgrade-control-plane-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  terminationGracePeriodSeconds: 30
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

...

---------------------------------------------
----New v1.28 kubeadm generated etcd.yaml----
---------------------------------------------
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.4:2379/
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.4:2379/
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.17.0.4:2380/
    - --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.4:2380/
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379/,https://172.17.0.4:2379/
    - --listen-metrics-urls=http://127.0.0.1:2381/
    - --listen-peer-urls=https://172.17.0.4:2380/
    - --name=kinder-upgrade-control-plane-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.9-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health?exclude=NOSPACE&serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health?serializable=false
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

@SataQiu
Copy link
Member

SataQiu commented Sep 14, 2023

Therefore, if a user initializes the cluster with old v1.28 kubeadm (<1.28.2), they may encounter problems when upgrading to v1.29.
However, if we can bump the etcd version number in v1.29, this will not be a problem.

@neolit123
Copy link
Member

neolit123 commented Sep 14, 2023

strange, in my test here:
#2927 (comment)

i runed against the older v1.28.2-1+a68748c7cd04f2 and did not get defaults.

--with-init-artifacts=v1.28.2-1+a68748c7cd04f2

you are running the same "old" version but getting a different etcd.yaml.

@SataQiu
Copy link
Member

SataQiu commented Sep 15, 2023

@neolit123 Emm... I don't really understand.

./kubeadm init phase etcd local will NOT get defaults.

But ./kubeadm init --ignore-preflight-errors=Swap,SystemVerification,FileContent--proc-sys-net-bridge-bridge-nf-call-iptables --config=/kind/kubeadm.conf --v=6 --upload-certs will get defaults.

kubeadm version:

# ./kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.2-1+a68748c7cd04f2", GitCommit:"a68748c7cd04f2462352afb05ba31f06fc799595", GitTreeState:"clean", BuildDate:"2023-09-13T09:54:55Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}
root@kinder-upgrade-control-plane-1:/# 

Perhaps related to the execution path?

@neolit123
Copy link
Member

no idea. init is technically calling the same phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kinder Issues to track work in the kinder tool kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
Development

No branches or pull requests

5 participants