Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialize logging after flag parsing + refactor commands #105076

Merged
merged 4 commits into from Oct 1, 2021

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 16, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

It wasn't documented that InitLogs already uses the log flush frequency, so
some commands have called it before parsing (for example, kubectl in the
original code for logs.go). The --log-flush-frequency flag never had an effect
in such commands.

Other fixes:

  • Print usage text before error to keep the error visible in the console when the
    usage text is long, but only print the usage text if the error actually was for
    flag parsing.
  • More commands use hyphens instead of underscores in their command line.
    Parameters with underscores are still accepted.
  • The validation error for kube-apiserver --logging-format=json --add-dir-header now references add-dir-header instead of add_dir_header.
  • --log-flush-frequency is not listed anymore in the --logging-format flag's
    non-default formats don't honor these flags usage text because it will also
    work for non-default formats once it is needed.
  • cmd/kubelet: the description of --logging-format uses hyphens instead of
    underscores for the flags, which now matches what the command is using.
  • staging/src/k8s.io/component-base/logs/example/cmd: added logging flags.
  • apiextensions-apiserver, kube-aggregator, sample-apiserver: they no longer
    print useless stack traces for main and goroutines when command line parsing fails;
    however, they also don't do that for other errors

Which issue(s) this PR fixes:

There's no bug open for this as far as I know. I found the problem while refactoring the code.

It fixes some things discovered while working the code:

Fixes: #105244

Special notes for your reviewer:

This turned into a major refactoring. Future changes for deprecating klog flags should become simpler.

Does this PR introduce a user-facing change?

`--log-flush-frequency` had no effect in several commands or was missing. Help and warning texts were not always using the right format for a command (`add_dir_header` instead of `add-dir-header`). Fixing this included cleaning up flag handling in component-base/logs: that package no longer adds flags to the global flag sets. Commands which want the klog and --log-flush-frequency flags must explicitly call logs.AddFlags; the new cli.Run does that for commands. That helper function also covers flag normalization and printing of usage and errors in a consistent way (print usage text first if parsing failed, then the error).

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 16, 2021
@pohly
Copy link
Contributor Author

pohly commented Sep 16, 2021

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 16, 2021
@k8s-ci-robot k8s-ci-robot added area/kubectl sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. labels Sep 16, 2021
@pohly pohly mentioned this pull request Sep 16, 2021
@caesarxuchao
Copy link
Member

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 16, 2021
go flushForever()
}

func flushForever() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this differ from wait.Forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.Forever is passed the period once and value changes don't have an effect. This new code looks at the current period each time it needs to decide how long to sleep, therefore changes in that value have an effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I get what you are doing. Can you add something to that effect to the description of flushForever?

For my knowledge, why/how would someone re-configure a flag? Aren't flags only set once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually yes, although that isn't required. The same flag set could also be used multiple times. The main reason for this change is that the logging is initialized in main() (which makes sense - the sooner the better) and flag parsing happens at a later time when the goroutine is already running.

I've added a comment explaining this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just require InitLogs to be called after flag parsing? We can recommend calling defer logs.FlushLogs() in main() ASAP to capture logs for cases where logging isn't setup yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered that, but came to the conclusion that enhancing the implementation is going to be simpler than checking and modifying all places where InitLogs is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... I lean towards the flush frequency not being configurable after startup. How many places would we have to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just counted 19, in various parts of the code base. Simply from a practical point of view I don't look forward to modify all of those.

But it's not just that. I think the current API with InitLogs being called as early as possible makes sense. It's easy to use and review in commands ("InitLogs and FlushLogs called in main()? Okay, done."). In the package, it gives us an opportunity to make some changes before log messages are emitted.

return g.duration.String()
}

func (g *guardedDuration) Set(value string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this implement a particular interface? Or why are these functions needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They implement flag.Value resp. pflag.Value methods. There's a type assertion further down (var _ pflag.Value = &guardedDuration{}). Shall I call that out already as method comment ("Set implements flag.Value.Set", etc.)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I just missed it.

Copy link
Contributor Author

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this. 👍

go flushForever()
}

func flushForever() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.Forever is passed the period once and value changes don't have an effect. This new code looks at the current period each time it needs to decide how long to sleep, therefore changes in that value have an effect.

return g.duration.String()
}

func (g *guardedDuration) Set(value string) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They implement flag.Value resp. pflag.Value methods. There's a type assertion further down (var _ pflag.Value = &guardedDuration{}). Shall I call that out already as method comment ("Set implements flag.Value.Set", etc.)?

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 20, 2021
@pohly
Copy link
Contributor Author

pohly commented Sep 21, 2021

/retest

@pohly
Copy link
Contributor Author

pohly commented Sep 21, 2021

Or another alternative: considering that the --log-flush-frequency flag hasn't worked (ever?) without anyone noticing, perhaps we can deprecate it together with the klog flags? I consider it useful, but not useful enough to justify changing how InitLogs works.

@serathius @thockin: any comments?

@thockin
Copy link
Member

thockin commented Sep 21, 2021

I think that flag was added for completeness but never exercised (obv.).

Logging before flag parsing is problematic for a million reasons.

My feeling is that this is now bordering on overly-encapsulated. This flag doesn't mean anything for many log impls so "less is more". We should push the configuration of logging toward main() and if we want to offer a default impl, do that in a distinct pkg.

@serathius
Copy link
Contributor

We historically merged such changes, but we need to make sure that we have enough time in release cycle to detect such any issues that arise and fix. This means that we need to merge this change early or wait for tests.

So you think it is too late for 1.23?

Don't know, we should ask root level approvers for that (assuming that you prefer to go through one approver vs approver per binary). @dims what's your opinion on this?

@serathius
Copy link
Contributor

Thanks for running manual tests, my main worry was about about human error, but included script should really help with that. It should allow reviewers/approver verify results on their own.

@pohly
Copy link
Contributor Author

pohly commented Sep 30, 2021

/test pull-kubernetes-node-e2e-containerd

@thockin
Copy link
Member

thockin commented Sep 30, 2021

Can we do a followup issue/PR(s) to add automated tests? I don't feel like this is THAT dangerous (famous last words?) to warrant abandoning it, given that we've NEVER had tests on this...

@pohly
Copy link
Contributor Author

pohly commented Oct 1, 2021

Can we do a followup issue/PR(s) to add automated tests?

I can't commit to working on that. For now I've filed #105392.

@pohly
Copy link
Contributor Author

pohly commented Oct 1, 2021

/test pull-kubernetes-node-e2e-containerd

@thockin
Copy link
Member

thockin commented Oct 1, 2021

I'll go ahead and approve this but leave a hold - @serathius if you want to discuss more, we can, but @pohly I think we can call for lazy-consensus. If no major objections before @pohly EOD Tuesday, (Europe time, I think) you (@pohly) can drop the hold.

Fair?

Thanks!

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 1, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 1, 2021
@serathius
Copy link
Contributor

I think that we should be ok with validating this using the script, hope that noone will break us before this is merged. I think we are good to go.
This is an awesome improvement! @pohly
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 1, 2021

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial e6940ea312999cc231512bbed9e94d7aea8f9331 link false /test pull-kubernetes-node-kubelet-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pohly
Copy link
Contributor Author

pohly commented Oct 1, 2021

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/dependency Issues or PRs related to dependency changes area/kubectl area/kubelet area/release-eng Issues or PRs related to the Release Engineering subproject area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
Archived in project
SIG Auth Old
Closed / Done
Development

Successfully merging this pull request may close these issues.

Kubectl flag normalization setup looks like a merge error
9 participants