Do not attempt to overwrite higher system (sysctl) values #103174

Napsty · 2021-06-25T05:29:02Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.

When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:

I0624 07:38:23.053960      54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999      54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com

Which issue(s) this PR fixes:

Fixes rancher/rancher#33360

Special notes for your reviewer:

The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505).
A real life use case is described in Rancher issue rancher/rancher#33360.

Does this PR introduce a user-facing change?

Changes behaviour of kube-proxy start; does not attempt to set specific sysctl values (which does not work in recent Kernel versions anymore in non-init namespaces), when the current sysctl values are already set higher.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2021-06-25T05:29:09Z

@Napsty: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-06-25T05:29:10Z

Welcome @Napsty!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-06-25T05:29:10Z

Hi @Napsty. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Napsty · 2021-07-23T07:44:17Z

ping @andrewsykim and @dcbw . Not really sure who else to ping. Let me know if someone else needs to do something first so this gets rolling. thx

brandond · 2021-07-30T19:15:23Z

Might this be sig-node since it's kubelet sysctl stuff?

Worked around in downstream projects:

With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values. The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505). A real life use case is described in Rancher issue rancher/rancher#33360. When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in: ``` I0624 07:38:23.053960 54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288 F0624 07:38:23.053999 54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied ``` However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it. Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com

Napsty · 2021-08-25T13:57:48Z

ping @andrewsykim and @dcbw , who should be assigned to this?

khenidak · 2021-09-03T01:28:49Z

/assign @khenidak

is it about running proxy with lower permission? or preventing the proxy from setting a value that might have impact outside lxc container?

brandond · 2021-09-03T04:42:23Z

It's about the kernel no longer allowing these sysctls to be set within non-root namespaces.

Napsty · 2021-09-03T05:08:55Z

@khenidak as @brandond said, but to be set within non "net init" namespaces to be more precise. Which is true for all started containers (LXC, Docker, ...).
The PR does not solve this a 100%, but allows a workaround that the server admin can set certain sysctl values high enough that kube-proxy accepts them (which is actually already the case, e.g. Ubuntu 20.04). In the current situation kube-proxy tries to set sysctl values to a certain pre-defined value - even if it is smaller than the current sysctl value.

khenidak · 2021-09-15T13:05:17Z

@Napsty ACK. can you add release note?

khenidak · 2021-09-15T13:05:27Z

/retest

Napsty · 2021-09-15T13:15:47Z

@khenidak

I'm sorry, but what exactly is meant with release note? I read https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md but I still don't understand whether this involves an additional file or just a comment in the commit? Do you have an example at hand or point to another PR for comparison? Thank you!

brandond · 2021-09-15T16:22:56Z

@Napsty there's a bit in the PR template where it says Does this PR introduce a user-facing change and you've responded with NONE. You should replace this with the actual user-facing change, as it will be worded in the release notes. You can look at pretty much any other PR for an example.

Napsty · 2021-09-15T16:30:41Z

@brandond correct. I understood "user facing" as something requiring user input - which is not the case here. In fact, the PR is without any user interaction.

We could still mention the different behavior when Kernel sys values are already set higher than the kube-proxy expected value. But I fail to understand where this needs to be done. I read the contributors release notes twice now and I'm still none the wiser ;-)

A bit of help/guidance for a first timer please :-)

brandond · 2021-09-15T17:49:00Z

It doesn't need to be something that the user has to take action on, just something that they should know about when upgrading. Think of it from a user or administrator's perspective - what would you like to know about this change? Would you like to know that you will no longer have to manually set sysctls before starting kube-proxy?

Here's an example of a PR with an information changelog entry:
#104997

khenidak · 2021-09-16T20:54:49Z

/retest
/lgtm
/approve

The user facing section has been filled. I think we are good to go. Thanks @Napsty for this.

k8s-ci-robot · 2021-09-16T20:55:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khenidak, Napsty

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-proxy/OWNERS~~ [khenidak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In a containerised environment (docker/LXC/LXD...) with a non net-init namespace it is not possible to adjust conntrack settings. kube-proxy attempts to do this and fails to start unless it is configured not to do so. The previous logic which detected the use of a docker type ansible connection plugin is converted to an overridable variable to allow a deployment tool to specify that conntrack should not be adjusted. See kubernetes/kubernetes#103174

aojea · 2023-09-05T15:28:28Z

cmd/kube-proxy/app/conntrack.go

@@ -96,7 +96,7 @@ func (realConntracker) setIntSysCtl(name string, value int) error {
 	entry := "net/netfilter/" + name

 	sys := sysctl.New()
-	if val, _ := sys.GetSysctl(entry); val != value {
+	if val, _ := sys.GetSysctl(entry); val != value && val < value {


I have doubts this is correct

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

aojea · 2023-09-05T15:30:07Z

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

I do not agree with this PR, kube-proxy tries to set a sysctl and the system is read only hence fails, this is a workaround based on the fact that the variable set is higher than the configured one, that does not apply to all sysctl variables, that may be boolean or have different values like rp_filter

aroradaman · 2023-09-05T19:10:38Z

@khenidak as @brandond said, but to be set within non "net init" namespaces to be more precise. Which is true for all started containers (LXC, Docker, ...). The PR does not solve this a 100%, but allows a workaround that the server admin can set certain sysctl values high enough that kube-proxy accepts them (which is actually already the case, e.g. Ubuntu 20.04). In the current situation kube-proxy tries to set sysctl values to a certain pre-defined value - even if it is smaller than the current sysctl value.

#103174 (comment)

I guess simply configuring KubeProxyConntrackConfiguration to exactly match the values that the server admin has configured for the host would have solved the issue without any code change.

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 25, 2021

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 25, 2021

k8s-ci-robot requested review from andrewsykim and dcbw June 25, 2021 05:29

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2021

Napsty mentioned this pull request Jun 25, 2021

Allow higher system sysctl values k3s-io/k3s#3505

Closed

Napsty mentioned this pull request Aug 4, 2021

Rancher 2.5 (Single Install) not starting after nf_conntrack_max value adjustment rancher/rancher#33360

Closed

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 6, 2021

Napsty force-pushed the rancher-33360 branch from 69bc17d to f3708fa Compare August 25, 2021 12:17

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2021

k8s-ci-robot assigned khenidak Sep 3, 2021

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 16, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 16, 2021

k8s-ci-robot merged commit 16823fc into kubernetes:master Sep 16, 2021

k8s-ci-robot added this to the v1.23 milestone Sep 16, 2021

Napsty deleted the rancher-33360 branch September 17, 2021 05:09

aroradaman mentioned this pull request Sep 5, 2023

Add support for nf_conntrack_tcp_be_liberal sysctl to kube-proxy #120354

Merged

aojea reviewed Sep 5, 2023

View reviewed changes

aroradaman mentioned this pull request Sep 6, 2023

Revert "Do not attempt to overwrite higher system (sysctl) values" #120448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not attempt to overwrite higher system (sysctl) values #103174

Do not attempt to overwrite higher system (sysctl) values #103174

Napsty commented Jun 25, 2021 •

edited

k8s-ci-robot commented Jun 25, 2021

k8s-ci-robot commented Jun 25, 2021

k8s-ci-robot commented Jun 25, 2021

Napsty commented Jul 23, 2021

brandond commented Jul 30, 2021 •

edited

Napsty commented Aug 25, 2021

khenidak commented Sep 3, 2021

brandond commented Sep 3, 2021

Napsty commented Sep 3, 2021 •

edited

khenidak commented Sep 15, 2021

khenidak commented Sep 15, 2021

Napsty commented Sep 15, 2021

brandond commented Sep 15, 2021 •

edited

Napsty commented Sep 15, 2021

brandond commented Sep 15, 2021 •

edited

khenidak commented Sep 16, 2021 •

edited

k8s-ci-robot commented Sep 16, 2021

aojea Sep 5, 2023

aojea commented Sep 5, 2023 •

edited

aroradaman commented Sep 5, 2023

Do not attempt to overwrite higher system (sysctl) values #103174

Do not attempt to overwrite higher system (sysctl) values #103174

Conversation

Napsty commented Jun 25, 2021 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jun 25, 2021

k8s-ci-robot commented Jun 25, 2021

k8s-ci-robot commented Jun 25, 2021

Napsty commented Jul 23, 2021

brandond commented Jul 30, 2021 • edited

Napsty commented Aug 25, 2021

khenidak commented Sep 3, 2021

brandond commented Sep 3, 2021

Napsty commented Sep 3, 2021 • edited

khenidak commented Sep 15, 2021

khenidak commented Sep 15, 2021

Napsty commented Sep 15, 2021

brandond commented Sep 15, 2021 • edited

Napsty commented Sep 15, 2021

brandond commented Sep 15, 2021 • edited

khenidak commented Sep 16, 2021 • edited

k8s-ci-robot commented Sep 16, 2021

aojea Sep 5, 2023

Choose a reason for hiding this comment

aojea commented Sep 5, 2023 • edited

aroradaman commented Sep 5, 2023

Napsty commented Jun 25, 2021 •

edited

brandond commented Jul 30, 2021 •

edited

Napsty commented Sep 3, 2021 •

edited

brandond commented Sep 15, 2021 •

edited

brandond commented Sep 15, 2021 •

edited

khenidak commented Sep 16, 2021 •

edited

aojea commented Sep 5, 2023 •

edited