Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove StartedPodsErrorsTotal metric message #105213

Conversation

yxxhero
Copy link
Member

@yxxhero yxxhero commented Sep 23, 2021

Signed-off-by: yxxhero aiopsclub@163.com

What type of PR is this?

/kind bug

What this PR does / why we need it:

remove StartedPodsErrorsTotal metrice message

Which issue(s) this PR fixes:

Fixes #105163

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

release-note Removed error message label from kubelet_started_pods_errors_total metric

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

Signed-off-by: yxxhero <aiopsclub@163.com>
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 23, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @yxxhero. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 23, 2021
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2021
@ehashman
Copy link
Member

/ok-to-test
/assign @dgrisonnet

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 23, 2021
@ehashman
Copy link
Member

/retitle Remove StartedPodsErrorsTotal metric message
/triage accepted

@k8s-ci-robot k8s-ci-robot changed the title Remove StartedPodsErrorsTotal metrice message Remove StartedPodsErrorsTotal metric message Sep 23, 2021
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 23, 2021
@ehashman ehashman added this to Triage in SIG Node PR Triage Sep 23, 2021
@yxxhero
Copy link
Member Author

yxxhero commented Sep 24, 2021

/test pull-kubernetes-node-e2e-containerd

1 similar comment
@yxxhero
Copy link
Member Author

yxxhero commented Sep 24, 2021

/test pull-kubernetes-node-e2e-containerd

@slashpai
Copy link

/lgtm

@k8s-ci-robot
Copy link
Contributor

@slashpai: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@249043822
Copy link
Member

The root cause maybe m.createPodSandbox(pod, podContainerChanges.Attempt) does not return well-defined error codes that have reasonable cardinality for metrics, like Line 891 below:

metrics.StartedContainersErrorsTotal.WithLabelValues(metricLabel, err.Error()).Inc()

var (
	// ErrCreateContainerConfig - failed to create container config
	ErrCreateContainerConfig = errors.New("CreateContainerConfigError")
	// ErrPreCreateHook - failed to execute PreCreateHook
	ErrPreCreateHook = errors.New("PreCreateHookError")
	// ErrCreateContainer - failed to create container
	ErrCreateContainer = errors.New("CreateContainerError")
	// ErrPreStartHook - failed to execute PreStartHook
	ErrPreStartHook = errors.New("PreStartHookError")
	// ErrPostStartHook - failed to execute PostStartHook
	ErrPostStartHook = errors.New("PostStartHookError")
)

@yxxhero
Copy link
Member Author

yxxhero commented Sep 24, 2021

@249043822 i will see it. Thanks very much.

@dgrisonnet
Copy link
Member

I agree with @249043822, instead of removing the error message completely we should log it if it isn't already the case and create error codes based on them. It would be great to be able to group them between error reasons (in a similar format as the Event reasons) and expose them in the metric via a reason label.

@yxxhero
Copy link
Member Author

yxxhero commented Sep 24, 2021

@249043822 @dgrisonnet ok, I will do as you say. Thanks very much.

@ehashman
Copy link
Member

@dgrisonnet We already log the error on L819:

metrics.StartedPodsErrorsTotal.WithLabelValues(err.Error()).Inc()
createSandboxResult.Fail(kubecontainer.ErrCreatePodSandbox, msg)
klog.ErrorS(err, "CreatePodSandbox for pod failed", "pod", klog.KObj(pod))

I think anything else beyond that isn't targeted enough for a bugfix.

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/priority important-soon
/lgtm

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 24, 2021
@ehashman ehashman moved this from Triage to Needs Approver in SIG Node PR Triage Sep 24, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2021
@ehashman
Copy link
Member

Can we possibly get a release note on here? Something like "Removed error message label from kubelet_started_pods_errors_total metric"

@dgrisonnet
Copy link
Member

Makes sense 👍

/lgtm

@dgrisonnet
Copy link
Member

/release-note-edit release-note Removed error message label from kubelet_started_pods_errors_total metric

@ehashman
Copy link
Member

I don't think the release note editing bot is live yet :)

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 24, 2021
@yxxhero
Copy link
Member Author

yxxhero commented Sep 24, 2021

@ehashman @dgrisonnet I already add the release note. Thanks very much.

@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Sep 28, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, yxxhero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2021
@k8s-ci-robot k8s-ci-robot merged commit e138afc into kubernetes:master Sep 28, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 28, 2021
@dgrisonnet
Copy link
Member

Could we have this PR backported to 1.22? This would allow cleaning up the metric on all the releases that include it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Large label value for the metric kubelet_started_pods_errors_total with label message when there is a error
7 participants