New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add verbose logs for node/plugin scores even ranged in low levels #103515
Add verbose logs for node/plugin scores even ranged in low levels #103515
Conversation
@muma378: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @muma378. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/retest |
@muma378 is this ready for review? If so please move it from "draft" to ready so we know it's not still a WIP :) |
@damemi very thanks for your remind, but i‘m still strugging in running e2e-test in my local laptop to check if the logs were output correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay on this @muma378. This looks pretty good to me, do I understand the performance results correctly that this improved the performance at all log levels? That's really interesting, is it due to the use of the heap do you think?
I think this is good to go, but will let @Huang-Wei do final lgtm since he caught the issue last time
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi, muma378 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Well, I think this made this PR merged without thorough reviewing. @wzshiming please hold off the directive After taking a glance, I can spot several issues:
|
@Huang-Wei |
@pacoxu: GitHub didn't allow me to request PR reviews from the following users: muma378. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I didn't realize the performance tests didn't cover all the cases, sorry. We should try to at least get some measurements for that first. If there is still a big hit to performance then we should revert |
+1 to reverting. The logic seems a bit ad hoc to me and not necessarily placed in the right pkg. If we are serious about dumping internal scheduler state or decisions, we need to design a library for that, which includes the logic we currently have for dumping the cache: |
@damemi I created #104849 to revert this PR. @muma378 Apologize for the reverting. It doesn't we don't accept the changes. It's just we want to review it thoroughly. Feel free to open a new PR, taking the comments #103515 (comment) and #103515 (comment) into consideration. |
@Huang-Wei Never mind, it is absolutely ok with me. I feel sorry about the mistake on performance testing and will keep following the issue. In terms of your and @ahg-g 's suggestion:
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
No scheduling info captured if the logging level less than 10. It would be a little help to find what happend if the top N highest node/plugin scores dumped at a lower level.
@damemi made some improvement in #99411 , however it is found ~50% performace drop in large cluster.
As @Huang-Wei suggests in #101820,we use a dynamical logging strategy to dump depending on the logging level:
The output looks like this with -v=4:
The output looks like this with -v=6:
Which issue(s) this PR fixes:
Fixes #101820
Special notes for your reviewer:
Performance was evaluated and compared with the tool k8s-sched-perf-stat:
I did the performance test with the following 4 scenarios(2 branches x 2 logging levels):
with the command:
and the diff is evelauted:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: