Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow updating scheduling directives of suspended jobs that never started #105479

Merged
merged 1 commit into from Oct 14, 2021

Conversation

ahg-g
Copy link
Member

@ahg-g ahg-g commented Oct 5, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implements kubernetes/enhancements#2926

Which issue(s) this PR fixes:

Fixes #104714

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Node affinity, node selector and tolerations are now mutable for jobs that are suspended and have never been started

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/2926-job-mutable-scheduling-directives

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 5, 2021
@k8s-ci-robot
Copy link
Contributor

@ahg-g: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 5, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 5, 2021
@ahg-g
Copy link
Member Author

ahg-g commented Oct 5, 2021

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Oct 5, 2021
@ahg-g
Copy link
Member Author

ahg-g commented Oct 5, 2021

/assign @liggitt

@liggitt liggitt added this to Assigned in API Reviews Oct 5, 2021
@liggitt liggitt added this to the v1.23 milestone Oct 5, 2021
@ahg-g ahg-g changed the title Allow updating scheduling directives suspended jobs that never started Allow updating scheduling directives for suspended jobs that never started Oct 5, 2021
@ahg-g ahg-g changed the title Allow updating scheduling directives for suspended jobs that never started Allow updating scheduling directives of suspended jobs that never started Oct 5, 2021
@ahg-g
Copy link
Member Author

ahg-g commented Oct 6, 2021

/sig scheduling
/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 6, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2021
Comment on lines 253 to 256
oldTemplate.Spec.NodeSelector = template.Spec.NodeSelector
oldTemplate.Spec.Tolerations = template.Spec.Tolerations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k I thought we had a verification check that would complain about these if there wasn't a +k8s:verify-mutation comment (in this case, it's actually ok because we're working on a copy made above, but I still expected the explicit check / comment to be required)

Copy link
Member

@liggitt liggitt Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahg-g this PR will need rebasing once #105578 merges, and will need to add +k8s:verify-mutation:reason=clone directives on the lines making cloned assignments to/from the old object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests are supposed to fail if this line isn't added? It seems that the tests are still passing

Copy link
Member

@liggitt liggitt Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify CI is failing the verify: non-mutating-validation test (you can run yourself with hack/verify-non-mutating-validation.sh)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done.

pkg/apis/batch/validation/validation.go Show resolved Hide resolved
oldTemplate := &oldSpec.Template
if opts.AllowMutableSchedulingDirectives {
oldTemplate = oldSpec.Template.DeepCopy()
if template.Spec.Affinity != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it misses the case where template.Spec.Affinity is nil and oldTemplate.Spec.Affinity has node affinity directives. if so, add a unit test that would have caught that

Copy link
Member Author

@ahg-g ahg-g Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! one thing here to note here, to succeed, the current validation will require having an empty Affinity if NodeAffinity was removed and other affinity types don't exist, removing the whole Affinity spec will fail.

Original

Template {
  Affinity: {
    NodeAffinity: {
       ....
    }
}

Update succeeds

Template: {
  Affinity: {}
}

Update fails

Template: {
}

supporting the case above will probably require cloning the new template or do some magic to check if Affinity is empty without actually individually checking its members (because the logic will break if a new member is added and we forget to update it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt is the above semantics acceptable?

Copy link
Member

@liggitt liggitt Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, something like this, perhaps:

switch {
  case template.Spec.Affinity == nil && oldTemplate.Spec.Affinity != nil:
    // allow the Affinity field to be cleared if the old template had no affinity directives other than NodeAffinity
    oldTemplate.Spec.Affinity.NodeAffinity = nil
    if (*oldTemplate.Spec.Affinity) == (api.Affinity{}) {
      oldTemplate.Spec.Affinity = nil
    }

  case template.Spec.Affinity != nil && oldTemplate.Spec.Affinity == nil:
    // allow the NodeAffinity field to skip immutability checking
    oldTemplate.Spec.Affinity.NodeAffinity = &api.Affinity{NodeAffinity: template.Spec.Affinity.NodeAffinity}

  case template.Spec.Affinity != nil && oldTemplate.Spec.Affinity != nil:
    // allow the NodeAffinity field to skip immutability checking
    oldTemplate.Spec.Affinity.NodeAffinity = template.Spec.Affinity.NodeAffinity
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, updated.

pkg/features/kube_features.go Outdated Show resolved Hide resolved
@liggitt liggitt moved this from In progress to Changes requested in API Reviews Oct 8, 2021
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 8, 2021
@k8s-ci-robot k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 8, 2021
@liggitt liggitt moved this from Changes requested to In progress in API Reviews Oct 11, 2021
@liggitt liggitt moved this from In progress to Changes requested in API Reviews Oct 14, 2021
@liggitt
Copy link
Member

liggitt commented Oct 14, 2021

/approve

API bits lgtm, go ahead and squash

@liggitt
Copy link
Member

liggitt commented Oct 14, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2021
@liggitt liggitt moved this from Changes requested to API review completed, 1.23 in API Reviews Oct 14, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2021
Field: "spec.template",
},
},
"mutable node affinity": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are covering the first and second branch. We could have these tests:

  • add node affinity
  • change node affinity
  • remove node affinity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another test to cover the third branch.

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2021
@k8s-ci-robot k8s-ci-robot merged commit dea052c into kubernetes:master Oct 14, 2021
@@ -629,6 +630,81 @@ func TestSuspendJobControllerRestart(t *testing.T) {
}, false)
}

func TestNodeSelectorUpdate(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for flagging, we need to wrap the update calls with RetryOnConflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: API review completed, 1.23
Development

Successfully merging this pull request may close these issues.

Make scheduling directives mutable for Jobs
4 participants