Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:claim cached in pvcontroller is not the newest may cause unexpected issue #105211

Conversation

xiaopingrubyist
Copy link

@xiaopingrubyist xiaopingrubyist commented Sep 23, 2021

Which issue(s) this PR fixes:

What this PR does / why we need it:
fix issue: #105131

Which issue(s) this PR fixes:
Fixes #<105131>

Fixed a bug that prevents PersistentVolume that has a Claim UID which doesn't exist in local cache but exists in ETCD from being updated to Released phase. 

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 23, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @xiaopingrubyist. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2021
@xiaopingrubyist
Copy link
Author

/assign

@xiaopingrubyist
Copy link
Author

/assign @saad-ali @thockin

@xiaopingrubyist
Copy link
Author

/assign @msau42

Copy link
Member

@msau42 msau42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

/assign @jsafrane
for more eyes

claim = nil
// in some cases, the cached claim is not the newest, and the volume.Spec.ClaimRef.UID is newer than cached.
// in this case ,we should double check by calling apiserver to get the newest claim.
klog.V(3).Infof("Maybe cached claim: %s is not the newest one, we should fetch it from apiserver", claimrefToClaimKey(volume.Spec.ClaimRef))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably keep this level at 4 with the other messages here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will update it and back to level 4.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 30, 2021
@xiaopingrubyist xiaopingrubyist force-pushed the fix-pv-controller-claim-cache-issue branch from be81036 to df95536 Compare September 30, 2021 08:08
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 30, 2021
@xiaopingrubyist xiaopingrubyist force-pushed the fix-pv-controller-claim-cache-issue branch from 59fd7c5 to 9d0652c Compare September 30, 2021 10:03
@xiaopingrubyist
Copy link
Author

/retest

@brianpursley
Copy link
Member

@xiaopingrubyist your go fmt is failing.

You can run hack/update-gofmt.sh to fix the issues and then it should pass.

@brianpursley
Copy link
Member

@xiaopingrubyist also be sure to follow the PR template.

I'm just trying to help you get this PR reviewed, because it seems you have put some effort into tracking down the cause of this problem.

@xiaopingrubyist
Copy link
Author

@xiaopingrubyist also be sure to follow the PR template.

I'm just trying to help you get this PR reviewed, because it seems you have put some effort into tracking down the cause of this problem.

thank you so much , i will update it.

@xiaopingrubyist
Copy link
Author

/retest

@xiaopingrubyist
Copy link
Author

/retest-required

@xiaopingrubyist xiaopingrubyist changed the title fix:cached claim is not the newest will cause unexpected issue fix:claim cached in pvcontroller is not the newest may cause unexpected issue Oct 6, 2021
@xiaopingrubyist
Copy link
Author

/test pull-kubernetes-e2e-gce-storage-slow

pkg/controller/volume/persistentvolume/pv_controller.go Outdated Show resolved Hide resolved
if claim.UID != volume.Spec.ClaimRef.UID {
klog.V(4).Infof("synchronizing PersistentVolume[%s]: claim %s has different UID, the old one must have been deleted", volume.Name, claimrefToClaimKey(volume.Spec.ClaimRef))
claim = nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a log that the newly retrieved claim is newer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, now the controller logs

Maybe cached claim: %s is not the newest one, we should fetch it from apiserver

If the retrieved PVC is wrong, it then logs:

synchronizing PersistentVolume[%s]: claim %s has a newer UID than pv.ClaimRef, the old one must have been deleted

This is fine.

But if the retrieved PVC is OK, there is no log, which leaves potential log reader with confusing "we should fetch it from apiserver", but no result of the fetch. Can you add else and some log that the fetched PVC is OK?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

pkg/controller/volume/persistentvolume/pv_controller.go Outdated Show resolved Hide resolved
@xiaopingrubyist xiaopingrubyist force-pushed the fix-pv-controller-claim-cache-issue branch from 90459a7 to 2f5b0da Compare October 13, 2021 04:01
@jsafrane
Copy link
Member

The code looks good now. Can you please squash the commits and provide a release note in the initial comment of this PR? This is a bug that might be interesting to some users and they should find it's fixed in the release notes.

Follow https://www.kubernetes.dev/docs/guide/release-notes/#applying-a-release-note

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 13, 2021
@xiaopingrubyist xiaopingrubyist force-pushed the fix-pv-controller-claim-cache-issue branch from 2f5b0da to f28a8d7 Compare October 13, 2021 12:03
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 13, 2021
@xiaopingrubyist
Copy link
Author

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@xiaopingrubyist
Copy link
Author

released notes added and commits squashed. @jsafrane

@jsafrane
Copy link
Member

Thanks a lot for your PR!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, xiaopingrubyist

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2021
@k8s-ci-robot k8s-ci-robot merged commit baaa53d into kubernetes:master Oct 14, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants