New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use "Capacity" instead of "Allocatable" for an accurate node memory total size #102917
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Welcome @bysnupy! |
Hi @bysnupy. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I signed it |
/kind bug |
/assign @soltysh |
@lauchokyip Great ! Thank you for your kind information. I didn't realized the previous one. Even if it's no problem on other feature, 100+% is confused sometimes on usual users. If possible we should notify this value can be more than 100% in the docs, or if the metrics is more than 100%, it should set as 100% is, not 100+%. e.g.> If actual calculation result is 125%, it should show users as 100%. How about your thought ? |
@bysnupy , I am not familiar with this area, this is what @soltysh responded and you might help it helpful |
@lauchokyip No problem. I've also created other PR here #103253 to fix this issue which is based on another standpoint. Could please TAL ? @soltysh I'd appreciate any help you can offer. Thank you ! |
@soltysh in https://github.com/kubernetes/kubernetes/pull/100222/files/7b43880b4c95e2928c59a3c2b72382aec8894e25#diff-c3a7260dd14d315707b6043e9ed76a19bfcdd233f465d7cd5b95dd2aaadcec6f you explain that:
If I've understood correctly, this PR claims the problem is that the numerator and the denominator in the calculation don't have the same logical basis: we're dividing a number that includes reserved capacity (actual usage) by a number that excludes it (allocatable), yielding a result that doesn't make sense. Is that true based on your knowledge of the underlying metrics? If it is, this PR's solution makes more sense to me than #103253, which essentially rounds down to hide the values when they're visibly incorrect but does not fix them otherwise. I'm going to change the reviewers on that PR to match this one, since they are alternatives. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/triage accepted
/priority backlog
/lgtm
/approve
/sig cli
@bysnupy you'll need to update the release note in the initial comment to get this merged
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bysnupy, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/sig cli |
/retest |
@bysnupy: You must be an org member to edit the release note. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Add an example for the show-capacity flag. If you specified this flag, show metrics based on capacity which is the total amount of resources that a Node has. * References: kubernetes#102917
What type of PR is this?
/kind bug
/sig cli
What this PR does / why we need it:
"Allocatable" is just a logical total memory size based on
[Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds]
.In contrast, the node memory usage collected by Metrics API(metrics.k8s.io) is based on real usage constantly on the node host. So if
system-reserved
orhard-eviction
threshold is configured bigger, theMEMORY%
can be overflow than100%
.We focus the "Allocatable" value is useful to schedule pods to a node, at that time scheduler consider the pod memory requests size based on logical value. All values and calculation is under logical assumption.
But the real node total memory usage that is "MEMORY%" of "kubectl top node" column value is different.
It should be based on the actual hard threshold size
(MemTotal in /proc/meminfo)
, because the collected metrics usage is also based on real usage on the node host.We should not mix the logical and real values to calculate some value accurately, so we should replace "Allocatable" with "Capacity" which is the same with the
MemTotal
in "MEMORY%" calculation for accurate total node memory usage.Which issue(s) this PR fixes:
Fixes #86499
Special notes for your reviewer:
Does this PR introduce a user-facing change?