From 1204ef308bf1c1f6b404ce3fe4eeee48f1b27cb2 Mon Sep 17 00:00:00 2001 From: James Sturtevant Date: Mon, 30 Sep 2024 14:15:11 -0700 Subject: [PATCH 1/4] Add Windows cpu and memory affinity Signed-off-by: James Sturtevant --- keps/prod-readiness/sig-windows/4885.yaml | 6 + .../README.md | 999 ++++++++++++++++++ .../kep.yaml | 46 + 3 files changed, 1051 insertions(+) create mode 100644 keps/prod-readiness/sig-windows/4885.yaml create mode 100644 keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md create mode 100644 keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml diff --git a/keps/prod-readiness/sig-windows/4885.yaml b/keps/prod-readiness/sig-windows/4885.yaml new file mode 100644 index 00000000000..677242775f3 --- /dev/null +++ b/keps/prod-readiness/sig-windows/4885.yaml @@ -0,0 +1,6 @@ +# The KEP must have an approver from the +# "prod-readiness-approvers" group +# of http://git.k8s.io/enhancements/OWNERS_ALIASES +kep-number: 4885 +alpha: + approver: "@johnbelamaric" diff --git a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md new file mode 100644 index 00000000000..bb170ed59c2 --- /dev/null +++ b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md @@ -0,0 +1,999 @@ + +# KEP-NNNN: Your short, descriptive title + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Windows CPU Discovery](#windows-cpu-discovery) + - [Windows Memory considerations](#windows-memory-considerations) + - [Kubelet memory management](#kubelet-memory-management) + - [Windows Topology manager considerations](#windows-topology-manager-considerations) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Deprecation](#deprecation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [x] (R) Design details are appropriately documented +- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [x] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +## Motivation + + +Add support for CPU and memory affinity on windows by enabling the cpu, memory and topology managers for Windows, which are currently not enabled. + +Enables Low latency workloads co-hosted on the same nodes in Windows Server show noisy neighbor behavior preventing them to achieve their performance goals. This feature is needed to add the necessary isolation to accomplish both high performance and co-hosting efficiency. The feature is enabled and available in Linux and Windows users are asking for the same features on Windows. + +### Goals + + +- Enable CPU manager for Windows allowing for CPU affinity +- Enable Memory Manager for Windows allowing for Memory Affinity +- Enable Topology Manager for Windows allowing for coordination of Memory and CPU affinity + +### Non-Goals + + + +- We do not wish to create new managers and instead re-use the existing logic provided +- Modify or bypass any existing feature gated features. Existing Policy features gates will still be used to progress specific policies related to the managers. + +## Proposal + + + +The proposal requires very little changes to the code for the managers and instead extends the [Windows](https://learn.microsoft.com/en-us/windows/win32/procthread/processor-groups) concepts to a CAdvisor mapping to enable the [topology structure in kubelet](https://github.com/kubernetes/kubernetes/blob/cede96336a809a67546ca08df0748e4253ec270d/pkg/kubelet/cm/cpumanager/topology/topology.go#L34-L39). + +There are no plans to change the core logic for selecting CPU's and NUMA nodes in the CPU/Memory/Tolopology managers from the existing KEPS ([memory-manager](keps/sig-node/1769-memory-manager)/[cpu-manager](keps/sig-node/3570-cpu-manager)/[topology-manager](keps/sig-node/693-topology-manager")). The logic is currently in platform agnostic +structure so the selection process is does not require changes to adapt for Windows. The Windows specific considerations for each of the managers will be covered in separate sections in this document. + + +### User Stories (Optional) + + + +#### Story 1 + +#### Story 2 + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +The technical risks are the same from existing keps: + - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3570-cpumanager#risks-and-mitigations + - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1769-memory-manager#risks-and-mitigations + - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/693-topology-manager#risks-and-mitigations + +For sig-windows, we also see a risk to enabling a feature that has already Stable or fully featured on Linux. To mitigate this risk we have opted to create a +separate KEP with a feature flag so we can communicate our status effectively. + +Another risk is the testing implementation for these features is mostly in e2e_node which doesn't currently support Windows. As a mitigation there was [some exploration ](https://github.com/jsturtevant/kubernetes/tree/e2e_node-windows) to see if these tests could be enabled on Windows so we can progress this feature with confidence in the testing suite. + +## Design Details + + + + +### Windows CPU Discovery + +The Windows Kubelet provides an implementation for the [cadvisor api](https://github.com/kubernetes/kubernetes/blob/fbaf9b0353a61c146632ac195dfeb1fbaffcca1e/pkg/kubelet/cadvisor/cadvisor_windows.go#L50) +in order to provide Windows stats to other components without modification. +The ability to provide the `cadvisorapi.MachineInfo` api is already partially mapped +in on the Windows client. By mapping the Windows specific topology API's to +cadvisor API, no changes are required to the CPU Manager. + +The [Windows concepts](https://learn.microsoft.com/windows/win32/procthread/processor-groups) are mapped to [Linux concepts](https://github.com/kubernetes/kubernetes/blob/cede96336a809a67546ca08df0748e4253ec270d/pkg/kubelet/cm/cpumanager/topology/topology.go#L34-L39) with the following: + +| Kubelet Term | Description | Cadvisor term | Windows term | +| --- | --- | --- | --- | +| CPU | logical CPU | thread | Logical processor | +| Core | physical CPU | Core | Core | +| Socket | socket | Socket | Physical Processor | +| NUMA Node | NUMA cell | Node | Numa node | + +The result of this mapping gives the following output from CPU manager after the conversion into kubelet's memory structure: + +```json +"Detected CPU topology" +topology={"NumCPUs":8,"NumCores":4,"NumSockets":1,"NumNUMANodes":1,"CPUDetails":{ +"0":{"NUMANodeID":0,"SocketID":1,"CoreID":0}, +"1":{"NUMANodeID":0,"SocketID":1,"CoreID":0}, +"2":{"NUMANodeID":0,"SocketID":1,"CoreID":2}, +"3":{"NUMANodeID":0,"SocketID":1,"CoreID":2}, +"4":{"NUMANodeID":0,"SocketID":1,"CoreID":4}, +"5":{"NUMANodeID":0,"SocketID":1,"CoreID":4}, +"6":{"NUMANodeID":0,"SocketID":1,"CoreID":6}, +"7":{"NUMANodeID":0,"SocketID":1,"CoreID":6}}} +``` + +The Windows API's used will be +- [getlogicalprocessorinformationex](https://learn.microsoft.com/windows/win32/api/sysinfoapi/nf-sysinfoapi-getlogicalprocessorinformationex) +- [nf-winbase-getnumaavailablememorynodeex](https://learn.microsoft.com/windows/win32/api/winbase/nf-winbase-getnumaavailablememorynodeex) + +One difference between the Windows API and Linux is the concept of [Processor groups](https://learn.microsoft.com/windows/win32/procthread/processor-groups). +On Windows systems with more than 64 cores the CPU's will be split into groups, +each processor is identified by its group number and its group-relative processor number. + +In Cri we will add the following structure to the `WindowsContainerResources` in CRI: + +```golang +message WindowsCpuGroupAffinity { + // CPU mask relative to this CPU group. + uint64 cpu_mask = 1; + // CPU group that this CPU belongs to. + uint32 cpu_group = 2; +} +``` + +Since the Kubelet API's are looking for a distinct ProcessorId, the processorid's will be calculated by looping +through the mask and calculating the ids with `(group *64) + procesorid` resulting in unique processor id's from `group 0` as `0-63` and +processor Id's from `group 1` as `64-127` and so on. This translation will be done only in kubelet, the `cpu_mask` will be used when +communicating with the container runtime. + +```golang +for i := 0; i < 64; i++ { + if GROUP_AFFINITY.Mask&(1< + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +The testing plan is to enable basic tests in [Windows testing folder](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/windows) in Alpha. This will enable us to progress to a state we in Alpha that will allow our end users to test and give feedback in real world scenarios. + +We we also work to enable e2e_node test suite to run on Windows and enable the applicable [CPU](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/container_manager_test.go)/[Memory](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/memory_manager_test.go)/[Topology](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/topology_manager_test.go) Manager tests for Beta. The goal will be to enable as many of those tests as possible while recognizing some may not be applicable to Windows. Where we find gaps we will fill them with Windows specific tests. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- pkg/kubelet/cm/container_manager_windows.go +- pkg/kubelet/cm/internal_container_lifecycle_windows.go +- pkg/kubelet/winstats/cpu_topology_test.go + +##### Integration tests + + + + + + +##### e2e tests + + + +- e2e_node will need to be enabled for windows to add + +### Graduation Criteria + + +#### Alpha + +- Feature implemented behind a feature flag +- Initial basic e2e tests in Windows e2e suite are added + +#### Beta + +- Gather feedback from developers +- e2e_node tests are in Testgrid and linked in KEP + +#### GA + +- 2 examples of real-world usage +- Allowing time for feedback + +**Note:** Generally we also wait at least two releases between beta and +GA/stable, because there's no opportunity for user feedback, or even bug reports, +in back-to-back releases. + +**For non-optional features moving to GA, the graduation criteria must include +[conformance tests].** + +[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md + +#### Deprecation + +- Announce deprecation and support policy of the existing flag +- Two versions passed since introducing the functionality that deprecates the flag (to address version skew) +- Address feedback on usage/changed behavior, provided on GitHub issues +- Deprecate the flag + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +N/A + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: WindowsCPUAndMemoryAffinity + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + No + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + Yes it uses a feature gate. Memory manager also has a state file that requires cleanup. + +###### Does enabling the feature change any default behavior? + + + +No, Additional settings are required to enable the features. The default policies for CPU/Memory manager will be `None`, meaning that they will not interact with running of pods. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +Yes. Restarting of the pods will be required to remove the CPU/Memory affinity. + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +Impact is node local, and doesn't affect rest of the cluster. + +It is possible that the state file from the memory/cpu manager will have inconsistent data during the rollout, because of the kubelet restart, but you can easily to fix it by removing memory manager state file and run kubelet restart. It should not affect any running workloads. + + +###### What specific metrics should inform a rollback? + + + +The pod may fail with the admission error because the kubelet can not provide all resources. You can see the error messages under the pod events. + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +We will use the existing Metrics provided by CPU/Memory Manager. + +https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3570-cpumanager#monitoring-requirements +https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1769-memory-manager#monitoring-requirements + +###### How can an operator determine if the feature is in use by workloads? + + + +The memory/cpu manager will be under the pod resources API. + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +n/a + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +These will be the same as cpu/memory/topology manager. + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +This will require changes to CRI and containerd Windows agents. + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +No + +###### Will enabling / using this feature result in introducing new API types? + + + +No + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +No + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +No + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +No + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +We will monitor for cpu consumption to query the CPU topology. If required we may wish to implement a caching strategy. + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +N/a + +###### What are other known failure modes? + + + +The failure modes for pods on the node are the same as in CPU/Memory/topology Manager + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml new file mode 100644 index 00000000000..e689b350bbe --- /dev/null +++ b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml @@ -0,0 +1,46 @@ +title: Windows CPU and Memory Affinity +kep-number: 4885 +authors: + - "@jsturtevant" +owning-sig: sig-windows +participating-sigs: + - sig-node +status: implementable +creation-date: 2024-09-03 +reviewers: + - TBD + - "@marosset" +approvers: + - TBD + +see-also: + - "keps/sig-node/1769-memory-manager" + - "keps/sig-node/3570-cpu-manager" + - "keps/sig-node/693-topology-manager" +replaces: + + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.32" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.32" + beta: "v1.33" + stable: "v1.34" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: WindowsCPUAndMemoryAffinity + components: + - kubelet +disable-supported: true + +# The following PRR answers are required at beta release +metrics: From eef698f086df4f2816d22c5ab26b5117397a86bc Mon Sep 17 00:00:00 2001 From: James Sturtevant Date: Mon, 30 Sep 2024 16:43:52 -0700 Subject: [PATCH 2/4] Respond to feedback Signed-off-by: James Sturtevant --- .../README.md | 33 +++++++++++-------- .../kep.yaml | 4 +-- 2 files changed, 22 insertions(+), 15 deletions(-) diff --git a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md index bb170ed59c2..921d9de192e 100644 --- a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md +++ b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md @@ -84,8 +84,6 @@ tags, and then generate with `hack/update-toc.sh`. - [Non-Goals](#non-goals) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - - [Story 1](#story-1) - - [Story 2](#story-2) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) @@ -136,7 +134,7 @@ checklist items _must_ be updated for the enhancement to be released. Items marked with (R) are required *prior to targeting to a milestone / release*. -- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) - [ ] (R) KEP approvers have approved the KEP status as `implementable` - [x] (R) Design details are appropriately documented - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) @@ -191,9 +189,14 @@ demonstrate the interest in a KEP within the wider Kubernetes community. [experience reports]: https://github.com/golang/go/wiki/ExperienceReports --> -Add support for CPU and memory affinity on windows by enabling the cpu, memory and topology managers for Windows, which are currently not enabled. -Enables Low latency workloads co-hosted on the same nodes in Windows Server show noisy neighbor behavior preventing them to achieve their performance goals. This feature is needed to add the necessary isolation to accomplish both high performance and co-hosting efficiency. The feature is enabled and available in Linux and Windows users are asking for the same features on Windows. +Add support for CPU and memory affinity for Windows nodes by enabling the cpu, memory and topology managers for Windows, +which are currently not enabled. + +Enabling Low latency workloads co-hosted on the same nodes in Windows Server show noisy neighbor behaviors +preventing them from achieving their expected performance goals. +This feature is needed to add the necessary isolation to accomplish both high performance and co-hosting efficiency. +The feature is enabled and available in Linux and Windows users are asking for the same features on Windows. ### Goals @@ -201,9 +204,9 @@ Enables Low latency workloads co-hosted on the same nodes in Windows Server show List the specific goals of the KEP. What is it trying to achieve? How will we know that this has succeeded? --> -- Enable CPU manager for Windows allowing for CPU affinity -- Enable Memory Manager for Windows allowing for Memory Affinity -- Enable Topology Manager for Windows allowing for coordination of Memory and CPU affinity +- Enable CPU manager for Windows allowing for CPU affinity for configured pods +- Enable Memory Manager for Windows allowing for memory affinity for configured pods +- Enable Topology Manager for Windows allowing for coordination of Memory and CPU affinity at the node level for scheduled pods ### Non-Goals @@ -229,7 +232,7 @@ nitty-gritty. The proposal requires very little changes to the code for the managers and instead extends the [Windows](https://learn.microsoft.com/en-us/windows/win32/procthread/processor-groups) concepts to a CAdvisor mapping to enable the [topology structure in kubelet](https://github.com/kubernetes/kubernetes/blob/cede96336a809a67546ca08df0748e4253ec270d/pkg/kubelet/cm/cpumanager/topology/topology.go#L34-L39). There are no plans to change the core logic for selecting CPU's and NUMA nodes in the CPU/Memory/Tolopology managers from the existing KEPS ([memory-manager](keps/sig-node/1769-memory-manager)/[cpu-manager](keps/sig-node/3570-cpu-manager)/[topology-manager](keps/sig-node/693-topology-manager")). The logic is currently in platform agnostic -structure so the selection process is does not require changes to adapt for Windows. The Windows specific considerations for each of the managers will be covered in separate sections in this document. +structures so the selection process is does not require changes for adoption on Windows. The Windows specific considerations for each of the managers will be covered in separate sections in this document. ### User Stories (Optional) @@ -241,9 +244,11 @@ the system. The goal here is to make this feel real for users without getting bogged down. --> -#### Story 1 +The User stories on Windows are similar to Linux: -#### Story 2 +https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3570-cpumanager#user-stories-optional +https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1769-memory-manager#user-stories +https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/693-topology-manager#user-stories-optional ### Notes/Constraints/Caveats (Optional) @@ -330,7 +335,7 @@ each processor is identified by its group number and its group-relative processo In Cri we will add the following structure to the `WindowsContainerResources` in CRI: -```golang +```protobuf message WindowsCpuGroupAffinity { // CPU mask relative to this CPU group. uint64 cpu_mask = 1; @@ -477,6 +482,7 @@ For Beta and GA, add links to added tests together with links to k8s-triage for https://storage.googleapis.com/k8s-triage/index.html --> +Integration tests do not run on Windows. Functionality will be covered by unit and e2e tests. ##### e2e tests @@ -490,7 +496,7 @@ https://storage.googleapis.com/k8s-triage/index.html We expect no non-infra related flakes in the last month as a GA graduation criteria. --> -- e2e_node will need to be enabled for windows to add +- e2e_node will need to be enabled for windows to add coverage ### Graduation Criteria @@ -525,6 +531,7 @@ Below are some examples to consider, in addition to the aforementioned [maturity - Feature implemented behind a feature flag - Initial basic e2e tests in Windows e2e suite are added +- unit tests for Windows specific components are added #### Beta diff --git a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml index e689b350bbe..531f9ddeff3 100644 --- a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml +++ b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml @@ -31,8 +31,8 @@ latest-milestone: "v1.32" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.32" - beta: "v1.33" - stable: "v1.34" + beta: "" + stable: "" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled From 2626c678b2c62b6814c0a73ce4c0e276dcdbed1e Mon Sep 17 00:00:00 2001 From: James Sturtevant Date: Tue, 1 Oct 2024 10:35:31 -0700 Subject: [PATCH 3/4] Update reviewers/approvers Signed-off-by: James Sturtevant --- .../4885-windows-cpu-and-memory-affinity/kep.yaml | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml index 531f9ddeff3..72a2256fe12 100644 --- a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml +++ b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/kep.yaml @@ -8,10 +8,12 @@ participating-sigs: status: implementable creation-date: 2024-09-03 reviewers: - - TBD - - "@marosset" + - "@ffromani" + - "@aravindhp" + - "@kiashok" approvers: - - TBD + - "@mrunalp" + - "@marosset" see-also: - "keps/sig-node/1769-memory-manager" From 5e9668712f2242e9913023824afd8ccdec7d917d Mon Sep 17 00:00:00 2001 From: James Sturtevant Date: Wed, 2 Oct 2024 10:10:49 -0700 Subject: [PATCH 4/4] Address sig-node feedback Signed-off-by: James Sturtevant --- .../README.md | 492 ++---------------- 1 file changed, 40 insertions(+), 452 deletions(-) diff --git a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md index 921d9de192e..07af5428f36 100644 --- a/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md +++ b/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md @@ -1,80 +1,4 @@ - -# KEP-NNNN: Your short, descriptive title - - - - +# KEP-4885: Windows CPU and Memory Affinity - [Release Signoff Checklist](#release-signoff-checklist) @@ -118,20 +42,6 @@ tags, and then generate with `hack/update-toc.sh`. ## Release Signoff Checklist - - Items marked with (R) are required *prior to targeting to a milestone / release*. - [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) @@ -149,10 +59,6 @@ Items marked with (R) are required *prior to targeting to a milestone / release* - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes - - [kubernetes.io]: https://kubernetes.io/ [kubernetes/enhancements]: https://git.k8s.io/enhancements [kubernetes/kubernetes]: https://git.k8s.io/kubernetes @@ -160,75 +66,32 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary - +This kep outlines how to add support for the CPU, Memory and Topology Managers in kubelet for Windows. +The Managers are already available and support in kubelet on Linux and there have been requests to sig-windows +to add support on Windows to help with workloads that require co-located workloads. The goal of the kep is to +add Windows support without significant changes to the Managers logic while providing the same feature sets available +on Linux today. ## Motivation - - -Add support for CPU and memory affinity for Windows nodes by enabling the cpu, memory and topology managers for Windows, -which are currently not enabled. - -Enabling Low latency workloads co-hosted on the same nodes in Windows Server show noisy neighbor behaviors +Currently enabling low latency workloads co-hosted on the same nodes in Windows Server create noisy neighbor behaviors preventing them from achieving their expected performance goals. -This feature is needed to add the necessary isolation to accomplish both high performance and co-hosting efficiency. +The CPU, Memory and Topology Managers feature is needed to add the necessary isolation to accomplish both high performance and co-hosting efficiency. The feature is enabled and available in Linux and Windows users are asking for the same features on Windows. ### Goals - - Enable CPU manager for Windows allowing for CPU affinity for configured pods - Enable Memory Manager for Windows allowing for memory affinity for configured pods - Enable Topology Manager for Windows allowing for coordination of Memory and CPU affinity at the node level for scheduled pods ### Non-Goals - - - We do not wish to create new managers and instead re-use the existing logic provided - Modify or bypass any existing feature gated features. Existing Policy features gates will still be used to progress specific policies related to the managers. ## Proposal - - The proposal requires very little changes to the code for the managers and instead extends the [Windows](https://learn.microsoft.com/en-us/windows/win32/procthread/processor-groups) concepts to a CAdvisor mapping to enable the [topology structure in kubelet](https://github.com/kubernetes/kubernetes/blob/cede96336a809a67546ca08df0748e4253ec270d/pkg/kubelet/cm/cpumanager/topology/topology.go#L34-L39). There are no plans to change the core logic for selecting CPU's and NUMA nodes in the CPU/Memory/Tolopology managers from the existing KEPS ([memory-manager](keps/sig-node/1769-memory-manager)/[cpu-manager](keps/sig-node/3570-cpu-manager)/[topology-manager](keps/sig-node/693-topology-manager")). The logic is currently in platform agnostic @@ -237,13 +100,6 @@ structures so the selection process is does not require changes for adoption on ### User Stories (Optional) - - The User stories on Windows are similar to Linux: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3570-cpumanager#user-stories-optional @@ -252,28 +108,12 @@ https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/693-topolog ### Notes/Constraints/Caveats (Optional) - +Windows does not have an API to constrain workloads to a specific NUMA node. This is addressed in the Memory Manager section below. ### Risks and Mitigations - -The technical risks are the same from existing keps: +The technical risks are the same from existing KEP's: - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3570-cpumanager#risks-and-mitigations - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1769-memory-manager#risks-and-mitigations - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/693-topology-manager#risks-and-mitigations @@ -281,18 +121,10 @@ The technical risks are the same from existing keps: For sig-windows, we also see a risk to enabling a feature that has already Stable or fully featured on Linux. To mitigate this risk we have opted to create a separate KEP with a feature flag so we can communicate our status effectively. -Another risk is the testing implementation for these features is mostly in e2e_node which doesn't currently support Windows. As a mitigation there was [some exploration ](https://github.com/jsturtevant/kubernetes/tree/e2e_node-windows) to see if these tests could be enabled on Windows so we can progress this feature with confidence in the testing suite. +Another risk is the testing implementation for these features is mostly in e2e_node which doesn't currently support Windows. As a mitigation there was [some exploration](https://github.com/jsturtevant/kubernetes/tree/e2e_node-windows) to see if these tests could be enabled on Windows so we can progress this feature with confidence in the testing suite. ## Design Details - - - ### Windows CPU Discovery The Windows Kubelet provides an implementation for the [cadvisor api](https://github.com/kubernetes/kubernetes/blob/fbaf9b0353a61c146632ac195dfeb1fbaffcca1e/pkg/kubelet/cadvisor/cadvisor_windows.go#L50) @@ -351,7 +183,7 @@ communicating with the container runtime. ```golang for i := 0; i < 64; i++ { - if GROUP_AFFINITY.Mask&(1< - [x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement. @@ -435,98 +256,22 @@ We we also work to enable e2e_node test suite to run on Windows and enable the a ##### Prerequisite testing updates - - ##### Unit tests - - - - - pkg/kubelet/cm/container_manager_windows.go - pkg/kubelet/cm/internal_container_lifecycle_windows.go - pkg/kubelet/winstats/cpu_topology_test.go ##### Integration tests - - - - Integration tests do not run on Windows. Functionality will be covered by unit and e2e tests. ##### e2e tests - - - e2e_node will need to be enabled for windows to add coverage ### Graduation Criteria - #### Alpha - Feature implemented behind a feature flag @@ -554,67 +299,16 @@ in back-to-back releases. #### Deprecation -- Announce deprecation and support policy of the existing flag -- Two versions passed since introducing the functionality that deprecates the flag (to address version skew) -- Address feedback on usage/changed behavior, provided on GitHub issues -- Deprecate the flag - +N/A ### Upgrade / Downgrade Strategy - - ### Version Skew Strategy - - N/A ## Production Readiness Review Questionnaire - - ### Feature Enablement and Rollback - No, Additional settings are required to enable the features. The default policies for CPU/Memory manager will be `None`, meaning that they will not interact with running of pods. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? @@ -718,6 +407,25 @@ that might indicate a serious problem? The pod may fail with the admission error because the kubelet can not provide all resources. You can see the error messages under the pod events. +There are existing metrics provided by Managers that can be monitored: + +```golang +// Metrics to track the CPU manager behavior +CPUManagerPinningRequestsTotalKey = "cpu_manager_pinning_requests_total" +CPUManagerPinningErrorsTotalKey = "cpu_manager_pinning_errors_total" +CPUManagerSharedPoolSizeMilliCoresKey = "cpu_manager_shared_pool_size_millicores" +CPUManagerExclusiveCPUsAllocationCountKey = "cpu_manager_exclusive_cpu_allocation_count" + +// Metrics to track the Memory manager behavior +MemoryManagerPinningRequestsTotalKey = "memory_manager_pinning_requests_total" +MemoryManagerPinningErrorsTotalKey = "memory_manager_pinning_errors_total" + +// Metrics to track the Topology manager behavior +TopologyManagerAdmissionRequestsTotalKey = "topology_manager_admission_requests_total" +TopologyManagerAdmissionErrorsTotalKey = "topology_manager_admission_errors_total" +TopologyManagerAdmissionDurationKey = "topology_manager_admission_duration_ms" +``` + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? -The memory/cpu manager will be under the pod resources API. +The memory/cpu manager will be under the pod resources API. And there are proposed metrics to improve this in [kubernetes/kubernetes#127155](https://github.com/kubernetes/kubernetes/pull/127155) ###### How can someone using this feature know that it is working for their instance? - - -- [ ] Events +- [x] Events - Event Reason: - [ ] API .status - Condition name: @@ -777,61 +476,22 @@ Recall that end users cannot usually observe component logs or access metrics. ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? - - n/a ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - - These will be the same as cpu/memory/topology manager. ###### Are there any missing metrics that would be useful to have to improve observability of this feature? - +Since the CPU/Memory/Topology manager are already implemented most of the metrics are implemented. If we find missing +metrics on Windows we will address as we move to Beta/Stable. ### Dependencies - ###### Does this feature depend on any specific services running in the cluster? - - This will require changes to CRI and containerd Windows agents. ### Scalability @@ -848,91 +508,32 @@ previous answers based on experience in the field. ###### Will enabling / using this feature result in any new API calls? - - No ###### Will enabling / using this feature result in introducing new API types? - - No ###### Will enabling / using this feature result in any new calls to the cloud provider? - - No ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - - No ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? - - No ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? - - -We will monitor for cpu consumption to query the CPU topology. If required we may wish to implement a caching strategy. +We will monitor for cpu consumption to query the CPU topology. If required we may wish to implement a caching strategy while also +supporting any new support for dynamic node resizing. ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? - +Memory and CPU's could be exhausted resulting in Pods not being scheduled. ### Troubleshooting @@ -953,19 +554,6 @@ N/a ###### What are other known failure modes? - - The failure modes for pods on the node are the same as in CPU/Memory/topology Manager ###### What steps should be taken if SLOs are not being met to determine the problem?