kubeadm: keep dns resourceRequirements on upgrade #3033

maxl99 · 2024-02-29T12:25:01Z

What happened?

After an upgrade with kubeadm the resourceRequirements will be reset.
Currently the memory limit is hardcoded to 170M in the template.

In our big prod cluster (1000 nodes,5k services, 20k pods) coreDNS needs more than 170M.
So we increased the mem limit in the deployment after an outage (all coredns pods got OOM killed)
After kubeadm upgrade the deployment gets updated to 170M
-> Due to that all coredns pods rotate(during startup the probes are sucessful) and will get oomkilled after loading/caching all dns entries

What did you expect to happen?

keep resourceRequests after kubeadm upgrade

How can we reproduce it (as minimally and precisely as possible)?

change deployment (increase resourceRequests)
run kubeadm upgrade

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2024-02-29T12:25:09Z

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2024-02-29T12:25:10Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2024-02-29T18:15:07Z

/transfer kubeadm

neolit123 · 2024-02-29T18:23:51Z

@maxl99 the unwritten kubeadm policy about coredns is the following, roughly - if you'd like customizations you can deploy your own coredns and manage it. that said we did add the feature to preserve the deployment replica count, so maybe we can also preserve other aspects of the existing deployment. but not such that would block the coredns migrator code (like "image" IIRC).

cc @pacoxu @SataQiu WDYT?

your PR comes 6 days before code freeze for 1.30 so i think it's a problem merging it without discussion:
kubernetes/kubernetes#123586

neolit123 · 2024-02-29T18:24:32Z

/remove-kind bug
/kind feature
(not a bug)

maxl99 · 2024-03-04T15:55:59Z

@neolit123 In my opinion it would be nice to preserve some specs in the deployment/read them from the existing k8s deplyoment. It is really bad to hardcode mem limits like in coredns (170M) without an option to change it.
This caused an outage in our main prod cluster...DNS always has a big impact :D
This would happen again and again with every "kubeadm upgrade"

neolit123 · 2024-03-04T17:09:10Z

can we understand better what else can be preserved from the deployment and handle it all in a single pr?

maxl99 · 2024-03-04T17:19:29Z

In our env we touched only the resourceRequirements and replicas(already handled by kubernetes/kubernetes#85837).
So for me it would be fine to do the same with the resourceRequirements.

But I imagine that there are also some other points for other people.

SataQiu · 2024-03-05T04:29:45Z

that said we did add the feature to preserve the deployment replica count, so maybe we can also preserve other aspects of the existing deployment. but not such that would block the coredns migrator code (like "image" IIRC).

IMO, if we supported every feature(replica、resource、label、annotation ...) this way, it would make the code difficult to maintain.
Perhaps we should support applying custom patches on addons in the v1beta4 API, even during the upgrade process.

pacoxu · 2024-03-05T06:10:27Z

Can we support kube-proxy and coredns patches like control plane components patches in https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-Patches?

BTW, can we skip dns upgrade like etcd?

      --etcd-upgrade                       Perform the upgrade of etcd. (default true)

neolit123 · 2024-03-05T07:36:02Z

Can we support kube-proxy and coredns patches like control plane components patches in https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-Patches?
BTW, can we skip dns upgrade like etcd?

kube-proxy is a DS so it's controlled by a single config (which we create on init), arguably doesn't need patches.
the addons are skipped during upgrade if they are not found, we also have e2e for that. so users can use addons with different CM names today.

IMO, if we supported every feature(replica、resource、label、annotation ...) this way, it would make the code difficult to maintain.
Perhaps we should support applying custom patches on addons in the v1beta4 API, even during the upgrade process.

sadly our dns code is already a mess because of the coredns migration integration.
i was thinking that maybe a DeepCopy of the active Deployment + some edits will not be a lot of code.
generally agree that a patch might be cleaner and less code for us to own.

is "coredns" as a patch target OK, or should it be something like "corednsdeployment"?

@maxl99
we are discussing allowing users to patch coredns instead of auto preserving options
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#patches

neolit123 · 2024-04-18T06:09:53Z

@SataQiu @pacoxu

any concerns about adding this patch target?
1.31 will open soon and i think i can try finding time for it.

is "coredns" as a patch target OK, or should it be something like "corednsdeployment"?

best to be explicit about the object type corednsdeployment, because patches target a single object and this leaves room if later we want to add e.g. corednsservice.

SataQiu · 2024-05-11T10:40:11Z

With kubernetes/kubernetes#124820, you can configure coredns resource requirements in the following way:

$ mkdir patches
$ vim ./patches/corednsdeployment+strategic.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: coredns
        resources:
          limits:
            memory: 2Gi
            cpu: 2
          requests:
             memory: 1Gi
             cpu: 1
$ kubeadm upgrade apply <version> --patches ./patches

neolit123 · 2024-06-04T06:52:46Z

closing as completed. 1.31 has the new feature to patch corednsdeployment

maxl99 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 29, 2024

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 29, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 29, 2024

maxl99 mentioned this issue Feb 29, 2024

kubeadm: persist dns resourceRequirements on upgrade kubernetes/kubernetes#123586

Closed

k8s-ci-robot transferred this issue from kubernetes/kubernetes Feb 29, 2024

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 29, 2024

neolit123 added this to the v1.31 milestone Apr 4, 2024

neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Apr 29, 2024

neolit123 self-assigned this Apr 29, 2024

SataQiu mentioned this issue May 11, 2024

kubeadm: add support for patching a corednsdeployment target kubernetes/kubernetes#124820

Merged

neolit123 closed this as completed Jun 4, 2024

neolit123 removed their assignment Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm: keep dns resourceRequirements on upgrade #3033

kubeadm: keep dns resourceRequirements on upgrade #3033

maxl99 commented Feb 29, 2024

k8s-ci-robot commented Feb 29, 2024

k8s-ci-robot commented Feb 29, 2024

neolit123 commented Feb 29, 2024

neolit123 commented Feb 29, 2024

neolit123 commented Feb 29, 2024

maxl99 commented Mar 4, 2024

neolit123 commented Mar 4, 2024

maxl99 commented Mar 4, 2024

SataQiu commented Mar 5, 2024

pacoxu commented Mar 5, 2024 •

edited

neolit123 commented Mar 5, 2024 •

edited

neolit123 commented Apr 18, 2024 •

edited

SataQiu commented May 11, 2024

neolit123 commented Jun 4, 2024

kubeadm: keep dns resourceRequirements on upgrade #3033

kubeadm: keep dns resourceRequirements on upgrade #3033

Comments

maxl99 commented Feb 29, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Feb 29, 2024

k8s-ci-robot commented Feb 29, 2024

neolit123 commented Feb 29, 2024

neolit123 commented Feb 29, 2024

neolit123 commented Feb 29, 2024

maxl99 commented Mar 4, 2024

neolit123 commented Mar 4, 2024

maxl99 commented Mar 4, 2024

SataQiu commented Mar 5, 2024

pacoxu commented Mar 5, 2024 • edited

neolit123 commented Mar 5, 2024 • edited

neolit123 commented Apr 18, 2024 • edited

SataQiu commented May 11, 2024

neolit123 commented Jun 4, 2024

pacoxu commented Mar 5, 2024 •

edited

neolit123 commented Mar 5, 2024 •

edited

neolit123 commented Apr 18, 2024 •

edited