Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: support rotation of etcd certs even if etcd is not upgraded #2989

Closed
KlavsKlavsen opened this issue Jan 5, 2024 · 10 comments · Fixed by kubernetes/kubernetes#124688
Labels
area/upgrades help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@KlavsKlavsen
Copy link

What happened?

our kube-apiserver just started failing to start.. The log from containerd only says:
2023-12-16T07:35:01.426019865+01:00 stderr F }. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-12-16T06:35:01Z is after 2023-12-03T03:37:19Z"

we checked apiserver certs etc. -and they were all fine.
We finally checked etcd endpoint - and that had an old cert (which for some reason was not updated with kubeadm update

kubeadm certs check-expiration had the details - but IMHO kubeadm k8s update SHOULD look at certs expiration and renew them all - instead of only renewing etcd certs IF there's an update for it.

What did you expect to happen?

a logmessage that told me which endpoint had the bad cert and details about cert.

We had updated using kubeadm and it renewed certs - but appearently it does not renew etcd certs - if there is no update of that component - which surprised us - as we thought it renewed all certs for cluster on upgrades :(

How can we reproduce it (as minimally and precisely as possible)?

update via kubeadm so etcd certs are not being renewed and "adjust OS time" - so etcd certs are expired but rest are not.

Anything else we need to know?

No response

Kubernetes version

Server Version: v1.26.4

we'll update soon (above might have been fixed in newer version?)

Cloud provider

Hetzner physical server

OS version

Ubuntu 22.04

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@KlavsKlavsen KlavsKlavsen added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 5, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@KlavsKlavsen
Copy link
Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 5, 2024
@neolit123
Copy link
Member

/transfer kubeadm

@k8s-ci-robot k8s-ci-robot transferred this issue from kubernetes/kubernetes Jan 5, 2024
@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 5, 2024
@neolit123
Copy link
Member

neolit123 commented Jan 5, 2024

kubeadm certs check-expiration had the details - but IMHO kubeadm k8s update SHOULD look at certs expiration and renew them all - instead of only renewing etcd certs IF there's an update for it.

yes, kubeadm upgrade does not rotate etcd certs if etcd is not upgraded.
what versions of k8s are you upgrading from -> to? do they use the same etcd version?

you could manually force renewal with kubeadm certs renew ...

EDIT: also if you upgrade more often, etcd certs will be rotated. k8s releases every 4 months and etcd versions are definitely upgraded them. etcd PATCH versions are also sometimes updated on k8s PATCH version.

@neolit123 neolit123 added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 5, 2024
@neolit123 neolit123 added this to the v1.30 milestone Jan 5, 2024
@neolit123
Copy link
Member

i think this is the ~3rd time i see confusing around the kubeadm upgrade behavior of etcd and certs rotation, so i'm tagging this as a feature request.

@neolit123
Copy link
Member

neolit123 commented Jan 5, 2024

our kube-apiserver just started failing to start.. The log from containerd only says:
2023-12-16T07:35:01.426019865+01:00 stderr F }. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-12-16T06:35:01Z is after 2023-12-03T03:37:19Z"

hm, i probably should not have transferred this to the kubeadm repository as the log problem is indeed an api-server / api-machinery one. you could log the k/k ticket again, but copying your description and we can repurpose this ticket here to investigate if we can rotate etcd certs on upgrade if other certs are rotated, or ... just update docs why this happens.

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

@neolit123 neolit123 changed the title apiserver connection error does not give useful details kubeadm: support rotation of etcd certs even if etcd is not upgraded Jan 5, 2024
@neolit123
Copy link
Member

kubeadm: support rotation of etcd certs even if etcd is not upgraded

cc @SataQiu @pacoxu
WDYT about A) rotating etcd certs every time k8s version upgrades vs B) just documenting that cert rotation for etcd is skipped if there is no etcd upgrade.

@pacoxu
Copy link
Member

pacoxu commented Jan 8, 2024

WDYT about A) rotating etcd certs every time k8s version upgrades vs B) just documenting that cert rotation for etcd is skipped if there is no etcd upgrade.

  1. With the default etcd certificate expiration date, if their cluster is upgraded more frequently than once a year, it will never expire.
  2. We may not take manually change etcd version into account(but it is a corner case).

A) rotating etcd certs every time k8s version upgrades
+1 as this is a low cost thing.

  1. In https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#automatic-certificate-renewal,

kubeadm renews all the certificates during control plane upgrade.

+1 for A) as we mentioned this in the doc 😄

Another choice may be adding a preflight check(error or warning) for cert expire when you run upgrade if the cert will be expire in 4 months.(a release cycle duration). It can make users do the renew manually if needed.

BTW, do we have a suggestion about how many months before expire users should renew certs?

@neolit123
Copy link
Member

neolit123 commented Jan 8, 2024

With the default etcd certificate expiration date, if their cluster is upgraded more frequently than once a year, it will never expire

exactly, this tells us that @KlavsKlavsen did no upgrade their cluster on time - i.e. at least once a year.
of course, there is the wider k8s problem here, that MINOR version upgrades can scare people if there are deprecations and removals of core APIs.

+1 for A) as we mentioned this in the doc 😄

i was under the impression that we have docs that etcd cert rotation is skipped if etcd upgrade is skipped.
but i can't find it, so apparently not.

Another choice may be adding a preflight check(error or warning) for cert expire when you run upgrade if the cert will be expire in 4 months.(a release cycle duration). It can make users do the renew manually if needed.

it could be added as something that "upgrade plan" does.
but i also vote for A), because that's the simpler UX. manual cert rotation is a bit advanced for the regular user and needs orchestration of component restarts.

BTW, do we have a suggestion about how many months before expire users should renew certs?

the kubelet also has one year client certs (by default, with 1024 byte keys, RSA), that it rotates if ~80% of the time in one year has passed - i.e. ~ 0.8 * 365 days

@neolit123
Copy link
Member

we seem to have some agreement that this is a desired change in kubeadm.
i will add the help-wanted label but the change might not be so trivial.

@neolit123 neolit123 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jan 17, 2024
@neolit123 neolit123 modified the milestones: v1.30, v1.31 Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrades help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants