New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for consuming whole NUMA nodes in CPUManager CPU assignments #102015
Add support for consuming whole NUMA nodes in CPUManager CPU assignments #102015
Conversation
/cc @nolancon |
@klueska: GitHub didn't allow me to request PR reviews from the following users: ranchothu. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold to prevent inadvertent merge due to premature LGTM |
/triage accepted |
983f915
to
13724eb
Compare
/assign @fromanirh |
bee4620
to
4f572d7
Compare
Signed-off-by: Kevin Klues <kklues@nvidia.com>
/test pull-kubernetes-integration |
4f572d7
to
c03038f
Compare
@fromanirh just rebased on master to see if that helps. |
/test pull-kubernetes-integration |
This allows us to get rid of the check for determining which one is higher all throughout the code. Now we just check once and instantiate an interface of the appropriate type that makes sure the ordering in the hierarchy is preserved through the appropriate calls. Signed-off-by: Kevin Klues <kklues@nvidia.com>
User the `cmp.Diff` package in the unit tests, moving away from `reflect.DeepEqual`. This gives us a clearer picture of the differences when the tests fail. Signed-off-by: Francesco Romani <fromani@redhat.com>
The exisiting unit tests where performing subtests without actually using the full features of the testing package (https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks) Update them with fairly minimal changes. The patch is deceptively large because we need to move the code inside a new block. Signed-off-by: Francesco Romani <fromani@redhat.com>
This batch of tests adds a real topology on which each physical socket has multiple NUMA zones. Taken by a real dual xeon 6320 gold. Signed-off-by: Francesco Romani <fromani@redhat.com>
This batch of tests adds a fake topology on which each numa node has multiple sockets. We didn't find yet a real HW topology in the wild like this, but we need one to fully exercise the code. So, until we find a HW topology, we add a fake one flipping the NUMA/socket config of the existing xeon dual gold 6320. Signed-off-by: Francesco Romani <fromani@redhat.com>
c03038f
to
4bae656
Compare
This now /lgtm from my perspective. |
@@ -26,20 +26,133 @@ import ( | |||
"k8s.io/kubernetes/pkg/kubelet/cm/cpuset" | |||
) | |||
|
|||
type numaOrSocketsFirstFuncs interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very nice solution, I like the direction.
nit: should be rename this interface like coresHierarchy
or cpuHierararchy
or so? Again I can't find a half-decent suggestion, so very minor atm (we can improve later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or memoryHierarchy
even?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel too strongly about the name of the interface, but I like the name numaOrSocketsFirst
for the name of the field in the cpuAccumulator
, because it lets me make calls like:
acc.numaOrSocketsFirst.takeFullFirstLevel()
acc.numaOrSocketsFirst.takeFullSecondLevel()
Which I think makes it very clear what it is doing. I just named the interface as an extension of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this makes it clear and the name nicer. Let's keep this name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is very nice indeed! I think we can leave the interface name as numaOrSocketsFirstFuncs
for now.
code changes /lgtm (again minor/nit comment inside), natural and clean extension of the existing codebase |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the ability to support systems where it is incorrect to assume the 1:1 mapping between sockets and NUMA nodes and allocating CPUs taking both NUMA nodes and CPUs sockets into consideration is a great addition to the code base.
Thanks @klueska @fromanirh for this improvement!
/lgtm
@@ -26,20 +26,133 @@ import ( | |||
"k8s.io/kubernetes/pkg/kubelet/cm/cpuset" | |||
) | |||
|
|||
type numaOrSocketsFirstFuncs interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is very nice indeed! I think we can leave the interface name as numaOrSocketsFirstFuncs
for now.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: klueska, swatisehgal The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
The algorithm used to make CPU allocations in the CPUManager currently assumes that sockets have a 1:1 mapping with NUMA nodes. This is not true in the general case, however, and this PR updates the algorithm to account for this.
NUMA is a little different than the existing
socket --> core --> cpu
relationship, in that NUMA can fit into this hierarchy in one of 2 places, either asnuma --> socket --> core --> cpu
orsocket --> numa --> core --> cpu
. This PR takes both of these situations into account and makes sure to perform its CPU allocations appropriately.Release notes: