Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix goroutine leak in the DeleteCollection #105606

Merged
merged 1 commit into from Oct 21, 2021

Conversation

sxllwx
Copy link
Member

@sxllwx sxllwx commented Oct 11, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

In our k8s cluster, 3 kube-apiservers are running and consume a lot of memory resources. Through analysis of Heap and Goroutine Profile, we can find that a lot of memory is used by DeleteCollection and has not been released. After analysis, I think there is a goroutine leak in DeleteCollection.

Sorry, because the company's regulations cannot provide a complete profile, here are some screenshots.

image

image

image

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

All workers exit, which will cause the permanent block of the task producer to be in the sent code block.

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L1131-L1139

I think that in the process of requesting the DeleteCollection interface to kube-apiserver, if a Leader switch or other error occurs in etcd, this bug will be triggered.

Does this PR introduce a user-facing change?

kube-apiserver: fix a memory leak when deleting multiple objects with a deletecollection.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2021
@neolit123
Copy link
Member

/release-note-edit

kube-apiserver: fix a memory leak when deleting multiple objects with a deletecollection.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 11, 2021
@liggitt
Copy link
Member

liggitt commented Oct 11, 2021

from what I can see, this should only be possible in unexpected panic cases... do you see panics logged?

@sxllwx
Copy link
Member Author

sxllwx commented Oct 11, 2021

from what I can see, this should only be possible in unexpected panic cases... do you see panics logged?

I only found the timeout filter related panic log. I think the goroutine leak here will not trigger panic. Please correct me if i am wrong

@fedebongio
Copy link
Contributor

/assign @CatherineF-dev @caesarxuchao
/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Oct 12, 2021
@k8s-ci-robot
Copy link
Contributor

@fedebongio: GitHub didn't allow me to assign the following users: CatherineF-dev.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @CatherineF-dev @caesarxuchao
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 12, 2021
@sxllwx
Copy link
Member Author

sxllwx commented Oct 14, 2021

/ping @liggitt

@liggitt
Copy link
Member

liggitt commented Oct 14, 2021

yeah, it looks like passing deleteoptions with a precondition (like a uid precondition) that causes the delete to return an error will exit workers early

this looks reasonable, but needs to be exercised via a test

@k8s-ci-robot k8s-ci-robot removed the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 15, 2021
case toProcess <- i:
case <-workersExited:
klog.V(4).InfoS("workers already exited, and there are some items waiting to be processed", "finished", i, "total", len(items))
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just break instead of return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the results of these two operations here are the same. ~ What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe was overthinking it, with this return instead of break you don't close(toProcess) ,
... but if you return before distributing all items , that means that all workers already finished, so no need to close toProcess

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sxllwx, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2021
@@ -1126,14 +1126,22 @@ func (e *Store) DeleteCollection(ctx context.Context, deleteValidation rest.Vali
}
wg := sync.WaitGroup{}
toProcess := make(chan int, 2*workersNumber)
errs := make(chan error, workersNumber+1)
errs := make(chan error, workersNumber)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand this +1 now, this is the number of workers goroutines + the distributor goroutine, all of them can send errors to this channel ... interestingly, we only report the first one ... should we drain the channel, aggregate and report all errors?

Copy link
Member Author

@sxllwx sxllwx Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting... 😄 ...

I don't think it is necessary to aggregate all errors here. If a operation fails, it should report directly. In order to ensure the fastest response. But I think drain the err chan is necessary here, logging can help debug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add the +1 back... there are workersNumber + 1 things that can send errors to the errs channel... if they all send an error, the channel will block, wg.Wait() won't return, and we'll hang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were discussing it here: #105606 (comment)

The only way the dispatcher function can emit an error is when it crash.
But I can't imagine any reason why it may crash.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug in to figure out detailed reasons it might crash. if we have panic handler code and workersNumber+1 things that could send errors, just size the channel buffer to match

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #105872 for this.
Though - I think that the HandleCrash here is more for "being on the safe side" reason - I don't know how I could make it crash even with any conspiration theory (which is also why I don't see how we can even test that).

Comment on lines 1181 to 1182
case err := <-errs:
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t
Copy link
Member

/lgtm cancel

@k8s-ci-robot k8s-ci-robot merged commit 2dede1d into kubernetes:master Oct 21, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 21, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 21, 2021
@wojtek-t
Copy link
Member

Heh... - I didn't make the cancelling on time.

@aojea - #105606 (comment) is a valid comment
We should send a follow up to it.

@wojtek-t
Copy link
Member

@aojea - #105606 (comment) is a valid comment
We should send a follow up to it.

Though - I don't see any reason why that goroutine may crash - so maybe not super critical...

@aojea
Copy link
Member

aojea commented Oct 21, 2021

yeah, me neither, I'm fine with it,

@aojea
Copy link
Member

aojea commented Oct 21, 2021

hmm @sxllwx this test pass with and without this patch 🤔

@wojtek-t
Copy link
Member

hmm @sxllwx this test pass with and without this patch

Yeah - it's somewhat expected. Because the changes in the code are actually two things:

  • change the blocking mechanism
  • fix the issue

This test wouldn't pass only if we had a PR with the first one, but without the second.

@sxllwx
Copy link
Member Author

sxllwx commented Oct 22, 2021

hmm @sxllwx this test pass with and without this patch 🤔

Without this patch, the new UT TestStoreDeleteCollectionWorkDistributorExited cannot pass.

@aojea
Copy link
Member

aojea commented Oct 22, 2021

hmm @sxllwx this test pass with and without this patch thinking

Without this patch, the new UT TestStoreDeleteCollectionWorkDistributorExited cannot pass.

it does, I run the test only without the patch and passes

@sxllwx
Copy link
Member Author

sxllwx commented Oct 22, 2021

hmm @sxllwx this test pass with and without this patch thinking

Without this patch, the new UT TestStoreDeleteCollectionWorkDistributorExited cannot pass.

it does, I run the test only without the patch and passes

Humm...Ignore my answer. Although the unit test can pass without this patch, there is still a leak of goroutine. (this is due to the concurrency mechanism of the previous version.)

@microyahoo
Copy link

microyahoo commented Oct 22, 2021

I think the code can be simplified as follows.

tokenBucket := make(chan struct{}, workersNumber)
errs := make(chan error, workersNumber)
wg := &sync.WaitGroup{}

for _, item := range items {
	wg.Add(1)
	tokenBucket <- struct{}{}

	go func(i runtime.Object) {
		// panics don't cross goroutine boundaries
		defer utilruntime.HandleCrash(func(panicReason interface{}) {
			errs <- fmt.Errorf("DeleteCollection goroutine panicked: %v", panicReason)
		})
		defer wg.Done()
		defer func() {
			<-tokenBucket
		}()
		accessor, err := meta.Accessor(i)
		if err != nil {
			errs <- err
			return
		}

		// DeepCopy the deletion options because individual graceful deleters communicate changes via a mutating
		// function in the delete strategy called in the delete method.  While that is always ugly, it works
		// when making a single call.  When making multiple calls via delete collection, the mutation applied to
		// pod/A can change the option ultimately used for pod/B.
		if _, _, err := e.Delete(ctx, accessor.GetName(), deleteValidation, options.DeepCopy()); err != nil && !apierrors.IsNotFound(err) {
			klog.V(4).InfoS("Delete object in DeleteCollection failed", "object", klog.KObj(accessor), "err", err)
			errs <- err
			return
		}
	}(item)
}
wg.Wait()

select {
case err := <-errs:
	return nil, err
default:
	return listObj, nil
}

@aojea
Copy link
Member

aojea commented Oct 22, 2021

Whatever wojtek or Jordan say :)

@sxllwx
Copy link
Member Author

sxllwx commented Oct 25, 2021

I think the code can be simplified as follows.

tokenBucket := make(chan struct{}, workersNumber)
errs := make(chan error, workersNumber)
wg := &sync.WaitGroup{}

for _, item := range items {
	wg.Add(1)
	tokenBucket <- struct{}{}

	go func(i runtime.Object) {
		// panics don't cross goroutine boundaries
		defer utilruntime.HandleCrash(func(panicReason interface{}) {
			errs <- fmt.Errorf("DeleteCollection goroutine panicked: %v", panicReason)
		})
		defer wg.Done()
		defer func() {
			<-tokenBucket
		}()
		accessor, err := meta.Accessor(i)
		if err != nil {
			errs <- err
			return
		}

		// DeepCopy the deletion options because individual graceful deleters communicate changes via a mutating
		// function in the delete strategy called in the delete method.  While that is always ugly, it works
		// when making a single call.  When making multiple calls via delete collection, the mutation applied to
		// pod/A can change the option ultimately used for pod/B.
		if _, _, err := e.Delete(ctx, accessor.GetName(), deleteValidation, options.DeepCopy()); err != nil && !apierrors.IsNotFound(err) {
			klog.V(4).InfoS("Delete object in DeleteCollection failed", "object", klog.KObj(accessor), "err", err)
			errs <- err
			return
		}
	}(item)
}
wg.Wait()

select {
case err := <-errs:
	return nil, err
default:
	return listObj, nil
}

I think this code is simpler and faster. But the buffer size of the errs channel needs to be adjusted to len(items).

@wojtek-t
Copy link
Member

I'm against the above - this potentially may start deletion of all items at once, which may overload etcd.
We need a control over number of inflight requests to etcd - which the workersNumber is achieving.

@microyahoo
Copy link

@wojtek-t The above is essentially the same as the existed. The goroutine can be executed only on the premise of obtaining a token, thereby ensuring that at most workersNumber requests can be executed each time.

@wojtek-t
Copy link
Member

OK sorry - I misread it. Yes - that looks fine

chenchun pushed a commit to chenchun/kubernetes that referenced this pull request Mar 20, 2024
…!1009)

1.20: 移植105606&105872,修复DeleteCollection导致的goroutine泄露问题
kubernetes#105606
kubernetes#105872
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants