New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating kube-proxy to ignore unready endpoints for Topology Hints #106507
Conversation
/triage accepted |
the events failures will be fixed by #106508 |
/retest |
No, unready endpoints get filtered out correctly later. The bug is just that if a zone has only unread endpoints, we will make the wrong decision about whether to use topology; it should say "there are no usable endpoints for this zone, so ignore topology and use endpoints from other zones as well", but instead it decides that topology is usable, but then filters out the unready endpoints, and so then the nodes in that zone think that the service has no endpoints and install a reject rule for the service. |
Eventually, #106510 will turn these changes into no-ops, right? But I guess we still need this for now to deal with kube-proxy / kube-controller-manager version skew. |
Yep, this is also a much smaller/more contained change, so if it should be easier to cherry-pick this one if we can only choose one. It's also possible that other controllers are generating EndpointSlices with hints, so good to have the logic here. |
Good catch, I've updated the PR description to be more accurate. |
if !endpoint.IsReady() { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do the same in the loop of L96 in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. TrafficPolicy=Local doesn't have a similar failover mechanism, if there are no local endpoints we shouldn't fall back to all endpoints like we're doing here. Also with the related work around terminating endpoints, we will send traffic to non-ready terminating local endpoints if no others are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ic, 👍
d8c8a15
to
1983f41
Compare
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, robscott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/milestone v1.23 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This fixes a bug identified by @danwinship in #106497 (comment). The bug meant that if hints were assigned to only unready endpoints in a zone, kube-proxy would still select those endpoints in filtering instead of falling back to the full list of endpoints. This fixes that.
A follow up bug fix will be coming soon that will update the controller to exclude unready endpoints from hint calculations. Update: That follow up fix has now been filed as #106510.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @aojea @danwinship
/sig network
/priority important-soon