Skip to content

Commit

Permalink
Update based on feedback
Browse files Browse the repository at this point in the history
Signed-off-by: James Sturtevant <[email protected]>
  • Loading branch information
jsturtevant committed Jun 28, 2024
1 parent d07f1a6 commit 6a12567
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 13 deletions.
19 changes: 12 additions & 7 deletions keps/sig-node/1769-memory-manager/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -782,13 +782,18 @@ The Memory Manager sets and enforces cgroup memory limit for ("on behalf of") a

### Windows considerations

Numa nodes can not be guaranteed via the Windows API, instead an [ideal Numa](https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support#numa-support-on-systems-with-more-than-64-logical-processors) node can be
configured via the [PROC_THREAD_ATTRIBUTE_PREFERRED_NODE](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute).
Using Memory manager's internal mapping this should provide the desired behavior in most cases. It is possible that a CPU could access memory from a different Numa Node than it is currently in, resulting in decreased performance. For this reason,
we will add documentation in addition to a log warning message in kubelet to help raise awareness.
If state is undesirable then `single-numa-node` and the CPU manager should be configured in the Topology Manager policy setting
which would force Kubelet to only select a numa node if it will have enough memory and CPU's available. In the future, in the case of workloads that span multiple Numa nodes, it may be desirable for
Topology manager to have a new policy specific for Windows.
Numa nodes can not be directly assigned or guaranteed via the Windows API. Another limitation of the Windows API's is that [PROC_THREAD_ATTRIBUTE_PREFERRED_NODE](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute) does not support setting this for the Job object (i.e. Container) and only supports setting a single Numa Node.
The `PROC_THREAD_ATTRIBUTE_PREFERRED_NODE` api works by assigning workloads to the Numa Node via CPU affinity. The API finds all processors associated with the Numa node then applies the CPU affinity to those processors which results in the memory from the Numa node being used.
In order to support multiple numa nodes and be able to apply the Numa affinity to job objects the container runtime will be expected to mimic
the behavior of [PROC_THREAD_ATTRIBUTE_PREFERRED_NODE](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute)
by finding the associated CPU's for the Numa Nodes that are passed via the Cri API and setting the preferred affinity for the job object.

Using Memory manager's internal mapping this should provide the desired behavior in most cases. It is possible that a CPU could access memory from a different Numa
Node than it is currently in, resulting in decreased performance. For this reason, we will add documentation, a log warning message in kubelet, and an warning event
to help raise awareness of this possibility. If access from the CPUs different than the assigned Numa Node is undesirable then `single-numa-node`
and the CPU manager should be configured in the Topology Manager policy setting which would force Kubelet to only select a Numa node if it will have enough memory
and CPU's available. In the future, in the case of workloads that span multiple Numa nodes, it may be desirable for Topology manager to have a new policy specific
for Windows. This would require a separate KEP to add a new policy.

#### Kubelet memory management

Expand Down
4 changes: 2 additions & 2 deletions keps/sig-node/3570-cpumanager/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,8 +246,8 @@ message WindowsCpuGroupAffinity {
```

Since the Kubelet API's are looking for a distinct ProcessorId, the id will be calculated by:
`(group *64) + procesorid` resulting in unique process id's from `group 0` as `1-64` and
process Id's from `group 1` as `65-128` and so on. When converting back to the Windows
`(group *64) + procesorid` resulting in unique processor id's from `group 0` as `1-64` and
processor Id's from `group 1` as `65-128` and so on. When converting back to the Windows
Group Affinity we will divide by 2 until we receive a value under 64, counting the number to determine
the groupID.

Expand Down
8 changes: 4 additions & 4 deletions keps/sig-node/693-topology-manager/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -933,8 +933,8 @@ harware was [added](https://github.com/kubernetes/test-infra/pull/28369) in Kube

## Windows considerations

Topology manager is already enabled on Windows in order to support the device manager. The same configuration options
and PRR applies to Windows. The CPU manager and Memory Manager can independently be enabled to support advance configuration
where affinity is applied. In the future a new Policy maybe required to address unique Numa Memory Management as described in the
Windows Section on the Memory Manager KEP.
Topology manager is already enabled on Windows in order to support the device manager. Since there are no changes to the
Topology manager, the answers to the [Production Readiness Review](#production-readiness-review-questionnaire) section also apply to Windows when CPU and Memory manager are
added as hint providers. The CPU manager and Memory Manager can independently be enabled or disabled to support cases where the features needs to be shut off.
In the future a new Policy (and new KEP) for the Topology manager maybe required to address unique Windows Numa Memory Management requirements as described in the Windows Section on the Memory Manager KEP.

0 comments on commit 6a12567

Please sign in to comment.