Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: add graceful shutdown events #101081

Merged

Conversation

rphillips
Copy link
Member

What type of PR is this?

/bug
/sig node
/triage accepted
/priority important-soon

What this PR does / why we need it:

#96129 added Graceful Node Shutdown to the Kubelet. #99735 moved the feature to Beta. This PR adds events to the shutdown logic so a user will know the status of the shutdown.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

cc @mrunalp @ehashman @harche @bobbypage

Does this PR introduce a user-facing change?

Adds Kubernetes Events to the Kubelet Graceful Shutdown feature

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Apr 13, 2021
@rphillips
Copy link
Member Author

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/kubelet and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Apr 13, 2021
@rphillips rphillips force-pushed the add_graceful_shutdown_event branch 2 times, most recently from 92fa3b7 to 89f7a02 Compare April 13, 2021 20:19
@ehashman
Copy link
Member

Generally LGTM!

/cc @wzshiming

who's also contributed a lot to this area.

@rphillips
Copy link
Member Author

/retest

Copy link
Member

@wzshiming wzshiming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbus-send --system /org/freedesktop/login1 org.freedesktop.login1.Manager.PrepareForShutdown boolean:true
<invalid>   Normal    Shutdown                  node/zsm-test-1              Shutdown manager detected new shutdown event
<invalid>   Normal    Shutdown                  node/zsm-test-1              Shutting down 2 pods
<invalid>   Normal    Killing                   pod/test-84cfc7bdf7-kn95k    Stopping container test
<invalid>   Normal    Killing                   pod/test-84cfc7bdf7-rj8gl    Stopping container test
0s          Normal    SuccessfulCreate          replicaset/test-84cfc7bdf7   Created pod: test-84cfc7bdf7-xfb7f
<invalid>   Normal    NodeNotReady              node/zsm-test-1              Node zsm-test-1 status is now: NodeNotReady
0s          Normal    Scheduled                 pod/test-84cfc7bdf7-xfb7f    Successfully assigned default/test-84cfc7bdf7-xfb7f to zsm-test-1
<invalid>   Warning   Shutdown                  pod/test-84cfc7bdf7-xfb7f    Node is in progress of shutting down, not admitting any new pods
0s          Normal    SuccessfulCreate          replicaset/test-84cfc7bdf7   Created pod: test-84cfc7bdf7-wpzqv
0s          Normal    Scheduled                 pod/test-84cfc7bdf7-wpzqv    Successfully assigned default/test-84cfc7bdf7-wpzqv to zsm-test-1
<invalid>   Warning   Shutdown                  pod/test-84cfc7bdf7-wpzqv    Node is in progress of shutting down, not admitting any new pods
0s          Normal    SuccessfulCreate          replicaset/test-84cfc7bdf7   Created pod: test-84cfc7bdf7-5vdbf
 dbus-send --system /org/freedesktop/login1 org.freedesktop.login1.Manager.PrepareForShutdown boolean:false
<invalid>   Normal    Shutdown                  node/zsm-test-1              Shutdown manager detected new shutdown event
<invalid>   Normal    NodeReady                 node/zsm-test-1              Node zsm-test-1 status is now: NodeReady
0s          Normal    TaintManagerEviction      pod/test-84cfc7bdf7-wpzqv    Cancelling deletion of Pod default/test-84cfc7bdf7-wpzqv
0s          Normal    TaintManagerEviction      pod/test-84cfc7bdf7-kn95k    Cancelling deletion of Pod default/test-84cfc7bdf7-kn95k
0s          Normal    TaintManagerEviction      pod/test-84cfc7bdf7-rj8gl    Cancelling deletion of Pod default/test-84cfc7bdf7-rj8gl
0s          Normal    TaintManagerEviction      pod/test-84cfc7bdf7-xfb7f    Cancelling deletion of Pod default/test-84cfc7bdf7-xfb7f

Seeing that the time information in the event is invalid, is it possible to include time information, which is still very useful when troubleshooting the cause of the shutdown

pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go Outdated Show resolved Hide resolved
@wzshiming
Copy link
Member

@rphillips
Copy link
Member Author

@wzshiming updated to incorporate your comments... I should have got them all. Thanks!

@ehashman ehashman added this to Triage in SIG Node PR Triage Apr 15, 2021
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage Apr 15, 2021
@rphillips
Copy link
Member Author

/test pull-kubernetes-node-e2e

@ehashman
Copy link
Member

/assign @bobbypage

@rphillips
Copy link
Member Author

@bobbypage Thank you for the review. I fixed the messages.

@ehashman
Copy link
Member

/test pull-kubernetes-integration

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

looked at previous review feedback and I think this is good to go assuming tests all pass

@ehashman ehashman moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Apr 28, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2021
@bobbypage
Copy link
Member

/lgtm

Thanks @rphillips for all your work on adding these events to graceful node shutdown!

@bobbypage
Copy link
Member

bobbypage commented Apr 28, 2021

/assign @mrunalp
for approvals

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2021
@ehashman ehashman moved this from Needs Approver to Waiting on Author in SIG Node PR Triage Jun 21, 2021
@rphillips rphillips force-pushed the add_graceful_shutdown_event branch from 4a97632 to d9be5ab Compare June 23, 2021 21:44
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021
@ehashman
Copy link
Member

re-
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021
@ehashman ehashman moved this from Waiting on Author to Needs Approver in SIG Node PR Triage Jun 23, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2021
@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Aug 17, 2021
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

2 similar comments
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 8dbc33d into kubernetes:master Aug 18, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants