New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generic ephemeral volume checks #100482
generic ephemeral volume checks #100482
Conversation
I can also split up this PR into three different PRs, one per modified directory. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments on general code patterns, I don't know the full context of this featuregate so I will have to look more into it but this looks like the right approach overall
28be27d
to
debff10
Compare
/triage accepted |
/cc @msau42 /assign @SergeyKanzhelev |
/assign |
27937bc
to
83b991d
Compare
I had to rebase because the feature gate package imports in |
@msau42 can the hold be removed? |
// The PVC is required to proceed with | ||
// scheduling of a new pod because it cannot | ||
// run without it. Bail out immediately. | ||
return fmt.Errorf("look up PVC %s/%s: %v", pod.Namespace, pvcName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we provide a little more context to the error message, like "failed to ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to write error messages as <operation>: <error>
. Then if it gets wrapped higher up, we don't accumulate those "failed to " comments. I've seen errors which were something like:
ERROR: failed to <do abc>: failed to <do xyz>: could not <do fgh>: original error here
That always seemed a bit redundant to me. But let me add it here.... done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to say "looking up PVC ...." because it is clear this is an error from the context, so adding "failed to" is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "looking up PVC" best, so I've switched to that.
@msau42 okay? I still need LGTM for this PR.
return fmt.Errorf("look up PVC %s/%s: %v", pod.Namespace, pvcName, err) | ||
} | ||
// If the PVC is invalid, we don't count the volume because | ||
// there's no guarantee that it belongs to the running predicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #105328 to followup on if we should ignore or count them.
/hold cancel |
f60ed36
to
3e7947e
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, but otherwise looks good from a sig-node / kubelet perspective.
pkg/kubelet/kubelet_pods.go
Outdated
} | ||
pvc, err := kl.kubeClient.CoreV1().PersistentVolumeClaims(pod.Namespace).Get(context.TODO(), pvcName, metav1.GetOptions{}) | ||
if err != nil { | ||
klog.InfoS("Unable to retrieve pvc", "pvc", klog.KRef(pod.Namespace, volume.PersistentVolumeClaim.ClaimName), "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to use pcvName
instead of volume.PersistentVolumeClaim.ClaimName
in this log message?
When adding the ephemeral volume feature, the special case for PersistentVolumeClaim volume sources in kubelet's host path and node limits checks was overlooked. An ephemeral volume source is another way of referencing a claim and has to be treated the same way.
Previously, the situation was ignored, which might have had the effect that Pod scheduling continued (?) even though the Pod+PVC weren't known to be in an acceptable state.
3e7947e
to
1d181ad
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, klueska, msau42, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources was overlooked in three places:
Of these, volume zone is deprecated an doesn't need to be changed to support ephemeral volumes.
Special notes for your reviewer:
@msau42 pointed out the gap in nodevolumelimits on chat. The other two places where found with:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: