Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error upgrading to 1.29.x with external CA #3055

Closed
NorthFuture opened this issue May 2, 2024 · 9 comments
Closed

Error upgrading to 1.29.x with external CA #3055

NorthFuture opened this issue May 2, 2024 · 9 comments
Assignees
Labels
area/pki PKI and certificate related issues kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@NorthFuture
Copy link

What happened?

Our clusters, currently at 1.28.9, are configured with external CA (no ca.key on filesystems) and all certificates are generated by an external system.

During the upgrade from 1.28.9 to 1.29.4 with the following command

kubeadm --kubeconfig /root/.kube/config --certificate-renewal=false upgrade apply v1.29.4

we get the following error

the CA files do not exist, please run kubeadm init phase certs ca to generate it: failed to load key: couldn't load the private key file /etc/kubernetes/pki/ca.key: open /etc/kubernetes/pki/ca.key: no such file or directory
[upgrade/postupgrade] FATAL post-upgrade error

the /root/.kube/config is an external config file with super admin short lived certificates

After a bit of digging, I found this

https://github.com/kubernetes/kubernetes/blob/d138c022d7fb3436add1c97b07004cf10319fb42/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L75

It seems it's not possible to upgrade to 1.29 with an external CA.

What did you expect to happen?

upgrade a cluster to 1.29 with an external CA.

How can we reproduce it (as minimally and precisely as possible)?

try to upgrade a cluster without ca.key inside pki folder

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.28.9
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4

Cloud provider

on premise, vanilla version

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ uname -a
Linux xxx 6.1.0-20-amd64 kubernetes/kubernetes#1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@NorthFuture NorthFuture added the kind/bug Categorizes issue or PR as related to a bug. label May 2, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 2, 2024
@k8s-ci-robot
Copy link
Contributor

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

  • /sig <group-name>
  • /wg <group-name>
  • /committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 2, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/transfer kubeadm

@k8s-ci-robot k8s-ci-robot transferred this issue from kubernetes/kubernetes May 2, 2024
@neolit123
Copy link
Member

neolit123 commented May 2, 2024

looks like this is something we did not cover with e2e tests
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-external-ca-1-29
(TODO: we need to include upgrades)

workaround: is it an option for you to temporary copy the "ca.key" to the node where 'kubeadm upgrade apply" is called?
after upgrade "ca.key" can be deleted.

@neolit123 neolit123 added this to the v1.29 milestone May 2, 2024
@neolit123
Copy link
Member

neolit123 commented May 2, 2024

https://github.com/kubernetes/kubernetes/blob/d138c022d7fb3436add1c97b07004cf10319fb42/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L75

this function call migrates the admin.conf on the node to not have a super user "system:masters", and generates a new super-admin.conf file with the super user.

we could skip this process for external CA users, then later when they renew manually "admin.conf" they would be picking a user they want.

only 1.29 is affected as 1.30 removed this function. it's a one release patch (migration) solution.

@neolit123 neolit123 self-assigned this May 2, 2024
@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. area/pki PKI and certificate related issues and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024
@neolit123
Copy link
Member

neolit123 commented May 2, 2024

fix for 1.29.next (5) is here:
kubernetes/kubernetes#124682
i think the 'next' release is middle of May.

@NorthFuture
Copy link
Author

Thank you for the prompt response. We'll wait for the next release, since the intermedate ca key is sealed on our vault and can't be extracted. We should issue a new temporary intermediate CA with an external private key for each of cluster and it's not straightforward since our root CA is airgapped 😀 again thank you for the fix

@neolit123
Copy link
Member

e2e addition for the upgrade scenario
#305

@neolit123
Copy link
Member

fixed in 1.29.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/pki PKI and certificate related issues kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

3 participants