Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --concurrent-ephemeralvolume-syncs flag for kube-controller-manager #102981

Merged

Conversation

SataQiu
Copy link
Member

@SataQiu SataQiu commented Jun 18, 2021

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

While many other controllers like deployment have flag --concurrent-xxx-syncs to set the number of workers, the ephemeral controller doesn't and just hardcoded with 1 worker. It is very slow to deal with a large number of ephemeral volumes in the cluster. This PR exports a flag --concurrent-ephemeralvolume-syncs with the default value of 1 for the ephemeral controller.

Which issue(s) this PR fixes:

xref #79169

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-controller-manager supports '--concurrent-ephemeralvolume-syncs' flag to set the number of ephemeral volume controller workers.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 18, 2021
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 18, 2021
@SataQiu SataQiu force-pushed the add-ephemeral-config-v1alpha1 branch from 25d704a to f0ae3d1 Compare June 18, 2021 08:43
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@SataQiu
Copy link
Member Author

SataQiu commented Jun 18, 2021

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Jun 18, 2021
@SataQiu
Copy link
Member Author

SataQiu commented Jun 18, 2021

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@liggitt liggitt added this to Assigned in API Reviews Jun 18, 2021
@liggitt liggitt added this to the v1.22 milestone Jun 18, 2021
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 22, 2021
@msau42
Copy link
Member

msau42 commented Jul 8, 2021

/assign @pohly @jsafrane

@lavalamp
Copy link
Member

lavalamp commented Jul 8, 2021

can someone from sig apps or storage review first?

@JamesLaverack
Copy link
Member

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.22 milestone Jul 9, 2021
@@ -401,8 +401,7 @@ func startEphemeralVolumeController(ctx ControllerContext) (http.Handler, bool,
if err != nil {
return nil, true, fmt.Errorf("failed to start ephemeral volume controller: %v", err)
}
// TODO (before beta at the latest): make this configurable similar to the EndpointController
go ephemeralController.Run(1 /* int(ctx.ComponentConfig.EphemeralController.ConcurrentEphemeralVolumeSyncs) */, ctx.Stop)
go ephemeralController.Run(int(ctx.ComponentConfig.EphemeralController.ConcurrentEphemeralVolumeSyncs), ctx.Stop)
return nil, true, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this TODO. It never made it into tracking list of opens for the feature and then of course was promptly forgotten - my bad.

Making it configurable makes sense. The rest seems to be mostly cut-and-paste from how it is done elsewhere and looks good to me.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 9, 2021
@jsafrane
Copy link
Member

/approve
from sig-storage perspective

@SataQiu
Copy link
Member Author

SataQiu commented Jul 15, 2021

/assign @liggitt

@@ -84,6 +85,9 @@ type KubeControllerManagerConfiguration struct {
// EndpointSliceMirroringControllerConfiguration holds configuration for
// EndpointSliceMirroringController related features.
EndpointSliceMirroringController endpointslicemirroringconfig.EndpointSliceMirroringControllerConfiguration
// EphemeralControllerConfiguration holds configuration for EphemeralController
// related features.
EphemeralController ephemeralconfig.EphemeralControllerConfiguration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest naming this EphemeralVolumeController for clarity (same for the type)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also applies to the v1alpha1/types.go file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// run it in your wrapper struct of this type in its `SetDefaults_` method.
func RecommendedDefaultEphemeralControllerConfiguration(obj *kubectrlmgrconfigv1alpha1.EphemeralControllerConfiguration) {
if obj.ConcurrentEphemeralVolumeSyncs == 0 {
obj.ConcurrentEphemeralVolumeSyncs = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 1 really the default we want? is there a downside to making this > 1 (most sync loops default to 5)

@liggitt liggitt moved this from Assigned to Changes requested in API Reviews Jul 22, 2021
@SataQiu SataQiu force-pushed the add-ephemeral-config-v1alpha1 branch from f0ae3d1 to 7fa0b9b Compare July 25, 2021 13:37
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2021
@SataQiu
Copy link
Member Author

SataQiu commented Jul 26, 2021

/test pull-kubernetes-integration

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/milestone v1.23

/approve
config API type mechanics lgtm

would like an ack on the thread safety of the controller's sync loop from @pohly

// run it in your wrapper struct of this type in its `SetDefaults_` method.
func RecommendedDefaultEphemeralVolumeControllerConfiguration(obj *kubectrlmgrconfigv1alpha1.EphemeralVolumeControllerConfiguration) {
if obj.ConcurrentEphemeralVolumeSyncs == 0 {
obj.ConcurrentEphemeralVolumeSyncs = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, would like an ack from @pohly that this controller's sync loop is threadsafe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly was meant to be thread safe, so although that hasn't been tested at scale because it couldn't be enabled earlier, it should be fine.

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Jul 27, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, liggitt, SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2021
@liggitt liggitt moved this from Changes requested to API review completed, 1.23 in API Reviews Aug 5, 2021
@liggitt
Copy link
Member

liggitt commented Aug 5, 2021

/lgtm
/hold
API review bits done, pending lgtm by controller author (@pohly). Can unhold once that is done.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Aug 5, 2021
@pohly
Copy link
Contributor

pohly commented Aug 5, 2021

I has my LGTM (#102981 (comment)).

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2021
@pohly
Copy link
Contributor

pohly commented Aug 5, 2021

/hold

Sorry, just noticed the additional questions... let me check.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2021
@pohly
Copy link
Contributor

pohly commented Aug 5, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2021
@k8s-triage-robot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@k8s-triage-robot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 7ab3e3c into kubernetes:master Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: API review completed, 1.23
Development

Successfully merging this pull request may close these issues.

None yet