New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not attempt to overwrite higher system (sysctl) values #103174
Conversation
@Napsty: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Welcome @Napsty! |
Hi @Napsty. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ping @andrewsykim and @dcbw . Not really sure who else to ping. Let me know if someone else needs to do something first so this gets rolling. thx |
Might this be sig-node since it's kubelet sysctl stuff? Worked around in downstream projects: |
With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values. The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505). A real life use case is described in Rancher issue rancher/rancher#33360. When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in: ``` I0624 07:38:23.053960 54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288 F0624 07:38:23.053999 54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied ``` However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it. Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
ping @andrewsykim and @dcbw , who should be assigned to this? |
/assign @khenidak is it about running proxy with lower permission? or preventing the proxy from setting a value that might have impact outside lxc container? |
It's about the kernel no longer allowing these sysctls to be set within non-root namespaces. |
@khenidak as @brandond said, but to be set within non "net init" namespaces to be more precise. Which is true for all started containers (LXC, Docker, ...). |
@Napsty ACK. can you add release note? |
/retest |
I'm sorry, but what exactly is meant with release note? I read https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md but I still don't understand whether this involves an additional file or just a comment in the commit? Do you have an example at hand or point to another PR for comparison? Thank you! |
@Napsty there's a bit in the PR template where it says |
@brandond correct. I understood "user facing" as something requiring user input - which is not the case here. In fact, the PR is without any user interaction. We could still mention the different behavior when Kernel sys values are already set higher than the kube-proxy expected value. But I fail to understand where this needs to be done. I read the contributors release notes twice now and I'm still none the wiser ;-) A bit of help/guidance for a first timer please :-) |
It doesn't need to be something that the user has to take action on, just something that they should know about when upgrading. Think of it from a user or administrator's perspective - what would you like to know about this change? Would you like to know that you will no longer have to manually set sysctls before starting kube-proxy? Here's an example of a PR with an information changelog entry: |
/retest The |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: khenidak, Napsty The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
In a containerised environment (docker/LXC/LXD...) with a non net-init namespace it is not possible to adjust conntrack settings. kube-proxy attempts to do this and fails to start unless it is configured not to do so. The previous logic which detected the use of a docker type ansible connection plugin is converted to an overridable variable to allow a deployment tool to specify that conntrack should not be adjusted. See kubernetes/kubernetes#103174
@@ -96,7 +96,7 @@ func (realConntracker) setIntSysCtl(name string, value int) error { | |||
entry := "net/netfilter/" + name | |||
|
|||
sys := sysctl.New() | |||
if val, _ := sys.GetSysctl(entry); val != value { | |||
if val, _ := sys.GetSysctl(entry); val != value && val < value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have doubts this is correct
However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.
I do not agree with this PR, kube-proxy tries to set a sysctl and the system is read only hence fails, this is a workaround based on the fact that the variable set is higher than the configured one, that does not apply to all sysctl variables, that may be boolean or have different values like |
I guess simply configuring |
What type of PR is this?
/kind bug
What this PR does / why we need it:
With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.
When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:
However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.
Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
Which issue(s) this PR fixes:
Fixes rancher/rancher#33360
Special notes for your reviewer:
The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505).
A real life use case is described in Rancher issue rancher/rancher#33360.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: