Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them #105631

Merged

Conversation

klueska
Copy link
Contributor

@klueska klueska commented Oct 12, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR implements the logic describe in KEP-2902.

It adds a new CPUManager policy option called distribute-cpus-across-numa to the static CPUManager policy. When enabled, this will trigger the CPUManager to evenly distribute CPUs across NUMA nodes in cases where more than one NUMA node is required to satisfy the allocation.

Does this PR introduce a user-facing change?

Add a new `distribute-cpus-across-numa` option to the static `CPUManager` policy. When enabled, this will trigger the `CPUManager` to evenly distribute CPUs across NUMA nodes in cases where more than one NUMA node is required to satisfy the allocation.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 12, 2021
@klueska
Copy link
Contributor Author

klueska commented Oct 12, 2021

/sig node
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2021
@klueska
Copy link
Contributor Author

klueska commented Oct 12, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 12, 2021
@klueska
Copy link
Contributor Author

klueska commented Oct 12, 2021

/cc @fromanirh @swatisehgal

The core algorithm hasn't been implemented yet, but the base infrastructure is now in place.

Please take note of:

This PR currently includes changes from #102015 which have not been merged yet. Please ignore the overlapping commits from this PR when reviewing.

@klueska
Copy link
Contributor Author

klueska commented Oct 13, 2021

@fromanirh @swatisehgal
Added logic to implement the algorithm itself. Tests to come next.

@klueska klueska force-pushed the upstream-distribute-cpus-across-numa branch 5 times, most recently from 357ba1d to 6f155d0 Compare October 13, 2021 12:27
@k8s-ci-robot k8s-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 13, 2021
This parameter ensures that CPUs are always allocated in groups of size
'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e.
hyperthreads) from the same core are handed out together.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
@klueska klueska force-pushed the upstream-distribute-cpus-across-numa branch 7 times, most recently from 002a080 to d9725e6 Compare October 17, 2021 14:23
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again a batch of minor comments, mostly for future cleanups (when we move to beta or so). The algorithm itself is nicely written and very nicely commented. I want to review the tests carefully today/tomorrow, and if everything goes as I expect, I'll add my LGTM.

@@ -264,6 +308,25 @@ func (a *cpuAccumulator) isFailed() bool {
return a.numCPUsNeeded > a.details.CPUs().Size()
}

func (a *cpuAccumulator) iterateCombinations(n []int, k int, f func([]int)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can (and probably should) be a free function as well, because it doesn't use cpuAccumulator fields at all. Not worth changing now, let's just not it down for the future refactoring.

type LoopControl bool

const (
Continue LoopControl = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: using bool here raise a flag. Could we use a naked bool maybe? or should we just move to a enum using integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with changing this to an enum. It's only a bool, because my first iteration of this didn't use constants at all -- I had the functions directly returning a bool, with true for continue and false for break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can totally wait for the refactoring, but it seems to suggest a plain bool could be simpler and equally readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@klueska
Copy link
Contributor Author

klueska commented Oct 18, 2021

@fromanirh

nit: this can (and probably should) be a free function as well, because it doesn't use cpuAccumulator fields at all. Not worth changing now, let's just not it down for the future refactoring.

Yes, as I pointed out here, I struggled with where to put this because I wanted it namespaced somehow. It ill be pulled out to a top-level function once we move to a cpassignment package.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
@ffromani
Copy link
Contributor

@fromanirh

nit: this can (and probably should) be a free function as well, because it doesn't use cpuAccumulator fields at all. Not worth changing now, let's just not it down for the future refactoring.

Yes, as I pointed out here, I struggled with where to put this because I wanted it namespaced somehow. It ill be pulled out to a top-level function once we move to a cpassignment package.

Sure, I was recording them all. I understand the issue at hand and I'm actually pushing (slightly :) ) to wait and address all these nits in the future cleanup work.

@klueska klueska force-pushed the upstream-distribute-cpus-across-numa branch from d9725e6 to 86f9c26 Compare October 18, 2021 08:54
@klueska
Copy link
Contributor Author

klueska commented Oct 18, 2021

Based on the feedback above, I am removing the hold on this PR. Just need an lgtm from here then.
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2021
@ffromani
Copy link
Contributor

/test pull-kubernetes-integration

@swatisehgal
Copy link
Contributor

/lgtm
/hold
@fromanirh please remove hold when you are happy with the PR

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2021
@ffromani
Copy link
Contributor

ffromani commented Oct 19, 2021

/lgtm
/hold cancel
This PR is in great shape, much better than it's expected for alpha level. Thanks @klueska !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants