New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: Admission must exclude completed pods and avoid races #104577
kubelet: Admission must exclude completed pods and avoid races #104577
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/triage accepted |
Still needs a unit test to exercise the admission filter here. Can add the node e2e here or separately for completed pod (although for regression purposes I'd be ok with that separate) |
/hold for the test or top level approver |
Fixes two issues with how the pod worker refactor calculated the pods that admission could see (GetActivePods() and filterOutTerminatedPods()) First, completed pods must be filtered from the "desired" state for admission, which arguably should be happening earlier in config. Exclude the two terminal pods states from GetActivePods() Second, the previous check introduced with the pod worker lifecycle ownership changes was subtly wrong for the admission use case. Admission has to include pods that haven't yet hit the pod worker, which CouldHaveRunningContainers was filtering out (because the pod worker hasn't seen them). Introduce a weaker check - IsPodKnownTerminated() - that returns true only if the pod is in a known terminated state (no running containers AND known to pod worker). This weaker check may only be called from components that need admitted pods, not other kubelet subsystems. This commit does not fix the long standing bug that force deleted pods are omitted from admission checks, which must be fixed by having GetActivePods() also include pods "still terminating".
9ed9b5f
to
a2ca66d
Compare
Thanks @smarterclayton for the quick investigation and proposed fix. I validated your PR against the minimal repro test created in #104560 (comment) and can confirm the PR here results in pods no longer being rejected by kubelet.
Test passes - build log - https://gist.github.com/c59e092d6befdb74fab35caeed2571c1 |
if kl.podWorkers.IsPodKnownTerminated(p.UID) { | ||
continue | ||
} | ||
if p.Status.Phase == v1.PodSucceeded || p.Status.Phase == v1.PodFailed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious for my own understanding, why is it necessary to also check the pod phase in addition to IsPodKnownTerminated
?
Is it not guaranteed that if status.IsTerminated()
for the pod worker then the pod is in succeeded or failed phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod worker is not guaranteed to track terminal pods - once a pod is terminal, the worker shuts down and then the metadata is eventually removed.
@bobbypage I ported your test case over into #104611. Any other changes you would like to see? |
@smarterclayton @bobbypage @rphillips I was able to write a node e2e to test this PR - #104658 |
/lgtm Tagging this and then we can pick one of the e2e PRs on top. |
Thanks @harche for creating node e2e test. I suggest we proceed with merging this PR as is and then either pick up @harche 's node e2e test as followup or the original one I made as @rphillips suggested in #104611. |
/lgtm |
/hold cancel |
…04577-upstream-release-1.22 Automated cherry pick of #104577: kubelet: Admission must exclude completed pods and avoid
@bobbypage I closed the with node e2e because it wasn't really reproducing the scenario correctly, I would vote for your original test case which went through scheduler which is now part of this PR, #104611 |
Fixes two issues with how the pod worker refactor calculated the
pods that admission could see (GetActivePods() and
filterOutTerminatedPods())
First, completed pods must be filtered from the "desired" state
for admission, which arguably should be happening earlier in
config. Exclude the two terminal pods states from GetActivePods()
Second, the previous check introduced with the pod worker lifecycle
ownership changes was subtly wrong for the admission use case.
Admission has to include pods that haven't yet hit the pod worker,
which CouldHaveRunningContainers was filtering out (because the
pod worker hasn't seen them). Introduce a weaker check -
IsPodKnownTerminated() - that returns true only if the pod is in
a known terminated state (no running containers AND known to pod
worker). This weaker check may only be called from components that
need admitted pods, not other kubelet subsystems.
This commit does not fix the long standing bug that force deleted
pods are omitted from admission checks, which must be fixed by
having GetActivePods() also include pods "still terminating".
/kind bug
/kind regression
Fixes #104560
Needs additional e2es to prevent future regressions.