Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubelet]: Reconcile OS and arch labels periodically #104613

Merged

Conversation

ravisantoshgudimetla
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla commented Aug 26, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

Kubelet should reconcile OS and Arch
labels periodically.

Which issue(s) this PR fixes:

Fixes #

Partially fixes kubernetes/enhancements#2803

Special notes for your reviewer:

This ensures that the kubelet would reject pods that have mismatching OS label selector. I'll expand this to the OS field in the pod spec once I introduce that field.

Does this PR introduce a user-facing change?

- Kubelet should reconcile `kubernetes.io/os` and `kubernetes.io/arch` labels on the node object. The side-effect of this is kubelet would deny admission to pod which has nodeSelector with label `kubernetes.io/os` or `kubernetes.io/arch` which doesn't match the underlying OS or arch on the host OS. 
- The label reconciliation happens as part of periodic status update which can be configured via flag `--node-status-update-frequency` 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/pull/2803

cc @liggitt @derekwaynecarr @marosset

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 26, 2021
@ravisantoshgudimetla ravisantoshgudimetla changed the title Reconcile labels [WIP] Reconcile labels Aug 26, 2021
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/kubelet labels Aug 26, 2021
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 26, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 30, 2021
@ravisantoshgudimetla ravisantoshgudimetla changed the title [WIP] Reconcile labels [kubelet]: Reconcile OS and arch labels periodically Aug 30, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

/retest

// Set the OS label if there is mismatch
osName, ok := newNode.Labels[v1.LabelOSStable]
if !ok || osName != sysruntime.GOOS {
if len(newNode.Labels) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log a warning here if the labels are not in sync?
I think people might be interested if something is repeatedly changing node labels to incorrect values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion one way or the other. The logs could cause unncessary spamming sometimes. However I can't think of an entity which changes the values apart from kubelet unless it is sort of rogue entity and which should be caught by other means like OPA or validating admission hooks.


func (kl *Kubelet) syncOSAndArchLabels() {
for {
time.Sleep(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable or less frequent than 100ms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am keeping it in sync with other methods. I don't see them configurable there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A wait.Forever here makes sense... 100ms seems too quick.

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the reconcile-labels branch 2 times, most recently from 0e9e352 to ada53f2 Compare August 31, 2021 02:20
@ravisantoshgudimetla
Copy link
Contributor Author

cc @aravindhp

@marosset
Copy link
Contributor

/sig windows

@k8s-ci-robot k8s-ci-robot added the sig/windows Categorizes an issue or PR as relevant to SIG Windows. label Aug 31, 2021
@pacoxu pacoxu added this to Triage in SIG Node PR Triage Sep 2, 2021
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Nov 6, 2021
@ravisantoshgudimetla ravisantoshgudimetla force-pushed the reconcile-labels branch 2 times, most recently from c2c872a to 726ca3a Compare November 6, 2021 16:47
@ravisantoshgudimetla
Copy link
Contributor Author

Tests passed when I ran them locally on fedora:

[sig-node] OSArchLabelReconciliation Kubelet 
  should reconcile the OS and Arch labels when running
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/os_label_rename_test.go:44
[BeforeEach] [sig-node] OSArchLabelReconciliation [NodeFeature:Ravig]
  /home/ravig/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:185
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename node-label-reconciliation
Nov  6 12:37:45.658: INFO: Skipping waiting for service account
[It] should reconcile the OS and Arch labels when running
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/os_label_rename_test.go:44
STEP: verifying the node has the label kubernetes.io/os linux
STEP: verifying the node has the label kubernetes.io/arch amd64
STEP: Waiting for node localhost.localdomain to have appropriate labels
Nov  6 12:37:55.688: INFO: Node arch linux, Node label amd64
[AfterEach] [sig-node] OSArchLabelReconciliation [NodeFeature:Ravig]
  /home/ravig/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:186
Nov  6 12:37:55.688: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "node-label-reconciliation-3046" for this suite.

• [SLOW TEST:10.055 seconds]
[sig-node] OSArchLabelReconciliation 
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/framework.go:23
  Kubelet
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/os_label_rename_test.go:22
    should reconcile the OS and Arch labels when running
    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/os_label_rename_test.go:44
------------------------------
SSSSSSSI1106 12:37:55.704942  252088 e2e_node_suite_test.go:238] Stopping node services...
I1106 12:37:55.704973  252088 server.go:257] Kill server "services"
I1106 12:37:55.705013  252088 server.go:294] Killing process 252320 (services) with -TERM
I1106 12:37:55.763826  252088 server.go:257] Kill server "kubelet"
I1106 12:37:55.770040  252088 services.go:156] Fetching log files...
I1106 12:37:55.770149  252088 services.go:165] Get log file "kern.log" with journalctl command [-k].
I1106 12:37:55.788520  252088 services.go:165] Get log file "cloud-init.log" with journalctl command [-u cloud*].
E1106 12:37:58.640505  252088 services.go:168] failed to get "cloud-init.log" from journald: Failed to add filter for units: No data available
, exit status 1
I1106 12:37:58.640524  252088 services.go:165] Get log file "docker.log" with journalctl command [-u docker].
I1106 12:37:58.682806  252088 services.go:165] Get log file "containerd.log" with journalctl command [-u containerd].
I1106 12:37:58.711827  252088 services.go:165] Get log file "containerd-installation.log" with journalctl command [-u containerd-installation].
I1106 12:37:58.723970  252088 e2e_node_suite_test.go:243] Tests Finished

JUnit report was created: /tmp/_artifacts/211106T123657/junit__01.xml

Ran 2 of 361 Specs in 32.509 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 359 Skipped
PASS

Ginkgo ran 1 suite in 32.639657986s
Test Suite Passed

@ravisantoshgudimetla
Copy link
Contributor Author

/retest

return false, err
}
osLabel, ok := node.Labels[v1.LabelOSStable]
if !ok || osLabel != runtime.GOOS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect these to be parameters to this function in case there are other labels we'd test like this (are there other labels we should be testing?)

@smarterclayton
Copy link
Contributor

/approve

The test is fine, i had comments but that doesn't block approval for me

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2021
@derekwaynecarr
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, ravisantoshgudimetla, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ehashman
Copy link
Member

ehashman commented Nov 8, 2021

/test pull-kubernetes-node-kubelet-serial

@k8s-ci-robot k8s-ci-robot merged commit cda360c into kubernetes:master Nov 8, 2021
SIG-Windows automation moved this from In Review (v1.23) to Done (v1.23) Nov 8, 2021
SIG Node PR Triage automation moved this from Waiting on Author to Done Nov 8, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 8, 2021
@marosset
Copy link
Contributor

marosset commented Nov 8, 2021

xref kubernetes/enhancements#2802
(the PR description links to a PR not the enhancement issue)

@ravisantoshgudimetla ravisantoshgudimetla deleted the reconcile-labels branch November 8, 2021 23:01
ravisantoshgudimetla added a commit to ravisantoshgudimetla/kubernetes that referenced this pull request Nov 9, 2021
Once kubernetes#104613 and kubernetes#104693
merge, we'll have OS field in pod spec. Kubelet should start rejecting pods
where pod.Spec.OS and node's OS(using runtime.GOOS) won't match
t.Logf("No action yet")
return false, nil
}
action := actions[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no gorutine free and can panic

Comment on lines +1 to +2
//go:build cgo && linux
// +build cgo,linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this test requires cgo ... cgo compilation of e2e_node tests is causing issues on go1.20 (xref #115613 and golang/go#58425)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a copy-paste error wrt cgo. I'll remove it and open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG-Windows
  
Done (v1.23)
Development

Successfully merging this pull request may close these issues.

None yet

10 participants