Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1769/KEP-3570/KEP693: Adding Windows Kubelet Manager implementation details #4738

Closed

Conversation

jsturtevant
Copy link
Contributor

  • One-line PR description:
    Adding Windows Kubelet Manager implementation details
  • Issue link:

Related PR:
CRI changes: kubernetes/kubernetes#124285
Kubelet changes: kubernetes/kubernetes#125296

/sig node
/sig windows

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 18, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 18, 2024
Signed-off-by: James Sturtevant <[email protected]>
@jsturtevant
Copy link
Contributor Author

/cc @marosset @aravindhp @knabben @kiashok
For sig-windows

/cc @haircommander @mrunalp
For sig-node

keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
keps/sig-node/1769-memory-manager/README.md Outdated Show resolved Hide resolved
Signed-off-by: James Sturtevant <[email protected]>
keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
## Windows considerations

Topology manager is already enabled on Windows in order to support the device manager. The same configuration options
and PRR applies to Windows. The CPU manager and Memory Manager can independently be enabled to support advance configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is PRR in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the same Production Readiness Review answers, in the sense that these features can be set to None to disable if required to roll back do to errors. I updated to clarify

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 28, 2024
Signed-off-by: James Sturtevant <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2024
Comment on lines 787 to 789
In order to support multiple numa nodes and be able to apply the Numa affinity to job objects the container runtime will be expected to mimic
the behavior of [PROC_THREAD_ATTRIBUTE_PREFERRED_NODE](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute)
by finding the associated CPU's for the Numa Nodes that are passed via the Cri API and setting the preferred affinity for the job object.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that setting PROC_THREAD_ATTRIBUTE_PREFERRED_NODE is exactly equivalent to setting the job object's affinity to the CPUs on that node? Or could there be underlying differences?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wording here. We don't know for certain if there are underlying differences, but will use the same general idea to work around the single Numa node limitation of this API.

Signed-off-by: James Sturtevant <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 2, 2024
Signed-off-by: James Sturtevant <[email protected]>
@kiashok
Copy link
Contributor

kiashok commented Sep 5, 2024

lgtm

Copy link
Contributor

@marosset marosset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 11, 2024
@jsturtevant
Copy link
Contributor Author

@kevpar @haircommander We've finalized this from a sig-windows perspective. I've also been able to get the proto-type working in kubernetes/kubernetes#125296 with the proposed changes to containerd @kiashok provided.

Could you take another look?

Comment on lines +793 to +795
- cpu manager selects fewer CPU's than Numa nodes and CPU's fall with in Numa node: Kubelet will only set only the CPU's selected by the cpu-manager as the memory from the memory manager will be used by default.
- cpu manager selects more CPU's than Numa nodes and CPU's fall within/or outside Numa node: kubelet will set selected only CPU's from cpu-manager
- cpu manager selects few CPU's than Numa nodes and CPU's fall outside the Numa Node: Kubelet would set the CPU's by cpu-manager plus all the CPU's associated with the Numa node.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume by "cpu manager selects fewer CPU's than Numa nodes" what we really mean is "cpu manager selects fewer CPU's than the CPU's associated with the Numa nodes selected by the memory manager"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can update the wording to be clearer

@ffromani
Copy link
Contributor

/cc

@SergeyKanzhelev
Copy link
Member

I would expect this to:

  1. Add new feature gates
  2. Add new graduation criteria for those KEPs that are not GA-d yet (ex KEP-1769: Graduate Memory Manager to GA #4251)
  3. Have a test cases section updated

@jsturtevant
Copy link
Contributor Author

I would expect this to:

  1. Add new feature gates
  2. Add new graduation criteria for those KEPs that are not GA-d yet (ex KEP-1769: Graduate Memory Manager to GA #4251)

This was discussed with sig-node awhile back and was agreed that since this can be turned off with cpuManagerPolicy: none we could make the updates here without feature gates. Did this change? @haircommander

kubernetes/kubernetes#124285 (comment)

Have a test cases section updated

I can update the test section with some details as appropriate

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jsturtevant, kevpar, marosset
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jsturtevant
Copy link
Contributor Author

After discussions with sig-node last week, we will break this out into a separate KEP with its own feature flag, this way we can communicate to users that the features are not Stable for windows as well as make it possible for users to opt into the feature.

@jsturtevant
Copy link
Contributor Author

superseded by #4888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

10 participants