Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SharedLister for NodeInfo #115

Merged
merged 3 commits into from
Sep 19, 2024
Merged

Support SharedLister for NodeInfo #115

merged 3 commits into from
Sep 19, 2024

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Aug 22, 2024

What type of PR is this?

/kind feature
/kind cleanup

What this PR does / why we need it:

  • Improve how the guest caches nodes for a better efficiency.
    • And, wire SharedLister().NodeInfos() and cached Nodes.
  • Support SharedLister for NodeInfo.
    • Only the basic node's data and ImageStates is supported for now. We can support other fields later.
  • Add ImageLocality plugin, ported from the in-tree plugin.

Which issue(s) this PR fixes:

Fixes #70
Part of #73

Special notes for your reviewer:

Does this PR introduce a user-facing change?

SharedLister for NodeInfo is supported.

What are the benchmark results of this change?

                                       │  before.txt  │              after.txt               │
                                       │    sec/op    │    sec/op      vs base               │
Example_NodeNumber/Simple/New-12         563.2m ±  2%    584.6m ±  3%   +3.82% (p=0.015 n=6)
Example_NodeNumber/Simple/Run-12         93.41µ ± 26%   108.65µ ± 12%        ~ (p=0.093 n=6)
Example_NodeNumber/Simple_Log/New-12     552.2m ±  2%    614.9m ± 16%  +11.35% (p=0.002 n=6)
Example_NodeNumber/Simple_Log/Run-12     109.4µ ±  8%    108.9µ ± 28%        ~ (p=0.485 n=6)
Example_NodeNumber/Advanced/New-12       633.0m ±  3%    660.1m ±  3%   +4.28% (p=0.002 n=6)
Example_NodeNumber/Advanced/Run-12       37.80µ ± 25%    38.30µ ±  4%        ~ (p=0.589 n=6)
Example_NodeNumber/Advanced_Log/New-12   649.9m ± 12%    648.1m ±  5%        ~ (p=0.937 n=6)
Example_NodeNumber/Advanced_Log/Run-12   42.80µ ±  3%    44.93µ ± 10%   +4.98% (p=0.002 n=6)
geomean                                  6.175m          6.486m         +5.03%

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 22, 2024
@sanposhiho sanposhiho marked this pull request as draft August 22, 2024 05:13
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 22, 2024
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 23, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 8, 2024
@sanposhiho sanposhiho marked this pull request as ready for review September 8, 2024 08:25
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2024
@sanposhiho
Copy link
Member Author

/cc @Gekko0114 @utam0k

Copy link
Member

@Gekko0114 Gekko0114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Though I haven't finished reading all the code, I left several minor comments and questions :)

examples/imagelocality/image_locality.go Outdated Show resolved Hide resolved
package api

// ImageStateSummary provides summarized information about the state of an image.
type ImageStateSummary struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment:
Nodes field is not necessary for now?
How about leaving the comment why this field is not implemented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes field is not necessary for now?

Yes, we don't use it (actually nothing uses it in upstream too).

How about leaving the comment why this field is not implemented?

I don't want to leave comments on everything unsupported. (We have too many unsupported things.

guest/api/framework.go Outdated Show resolved Hide resolved
)

type SharedLister interface {
NodeInfos() guestapi.NodeInfoList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:
How about adding the comment that StorageInfos are not implemented yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, I don't want to leave comments on everything unsupported.

@@ -49,6 +49,8 @@ func _prefilter() uint32 { //nolint
// This function begins a new scheduling cycle: zero out any cycle state.
currentPod = nil
currentCycleState = map[string]any{}
currentNodeInfoList = nil
isFullNodeInfoList = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this isFullNodeInfoList logic work? Is it because the Node used in the scheduler's step always goes through the prefilter process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we call preFilter regardless of whether an user implements preFilter or not, so that we can reset all stuff the start of the scheduling cycle.

type nodeList struct {
items []proto.Node
type filteredNodeInfoList struct {
// It only contains the Nodes that passed the filtering phase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you define filteredNodeInfoList even though sharedLister has nodeinfos? Is this because wasm's prescore step is sometimes called without calling previous steps like prefilter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, as documented at this line, we have to know which nodes have gone thru the filtering phase.

As you can see in filteredNodeInfoList.lazyItems, actually filteredNodeInfoList utilizes nodes in sharedLister too to use the cached nodes there.
Also, we don't want to put this "listing nodes after filtering" logic into sharedLister because we want to keep sharedLister pure, and "listing nodes that have gone thru the filtering phase" is specific to PreScore.

@sanposhiho
Copy link
Member Author

@Gekko0114 Addressed or replied to your comments.

Copy link
Member

@Gekko0114 Gekko0114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but leaves lgtm label for another reviewer.
Also left several minor comments.

examples/imagelocality/image_locality.go Show resolved Hide resolved
bufLimit := bufLimit(stack[3])

var nodeName string
if b, ok := mod.Memory().Read(nodename, nodenameLen); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but at first, I couldn't understand this code.
This code retrieves the node name from the wasm module and returns nodeinfo to the wasm module, right?
Additionally, I couldn't understand why this function is necessary even though k8sSchedulerCurrentNodeNameFn exists. (I understand it now)

Could you leave a brief comment explaining this function?

Copy link
Member Author

@sanposhiho sanposhiho Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment though, what's unclear? (Does 2f5a3eb clarify it?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm 😀

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Gekko0114, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sanposhiho
Copy link
Member Author

Merging the PR.

@Gekko0114 (fyi @utam0k )
This repo is under development anyway and doesn't need to be conservative. So, let's not wait for other people's approval like k/k.
I mean not only PRs from me, but also ones from anyone. I know people besides me cannot /approve though, I'll basically just put the approval for PRs that either of you put the stamp.

@sanposhiho sanposhiho merged commit a5b575b into main Sep 19, 2024
5 of 6 checks passed
@sanposhiho sanposhiho deleted the sharedsnap branch September 19, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache Node data via single unified Node data
3 participants