New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterIP Allocator metrics #104119
ClusterIP Allocator metrics #104119
Conversation
f255f51
to
7d81b00
Compare
94d47dc
to
09ea3bf
Compare
Add 4 new metrics to the ClusterIP allocators: - current number of available IPs per Service CIDR - current number of used IPs per Service CIDR - total number of allocation per Service CIDR - total number of allocation errors per ServiceCIDR
09ea3bf
to
ee7562a
Compare
@dcbw @danwinship you may be interested too |
ok, offset := r.contains(ip) | ||
if !ok { | ||
// update metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we tracking user error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the metrics is allocation_errors_total
, I didn't want to discriminate on the error type.
I don't expect users to use this metric to narrow down problems, they should go to the apiserver audit.log for that, instead, I expect this to be a measure of the allocation process health, as in "hey, you are having error allocating ips, you should take a look on what is going on here"
@@ -165,31 +167,58 @@ func (r *Range) CIDR() net.IPNet { | |||
// or has already been reserved. ErrFull will be returned if there | |||
// are no addresses left. | |||
func (r *Range) Allocate(ip net.IP) error { | |||
label := r.CIDR() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not doing a simple:
// elevate allocated declaration
var allocated bool
var err error
defer reportMetric(r.cidr, err)
// create a new reportMetric func and use it from both Allocate(), and AllocateNext()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started with that, but then I've added errors, and reportMetric is not covering all metrics (errors), so I went back to add each metrics manually
|
||
var ( | ||
// clusterIPAllocated indicates the amount of cluster IP allocated by Service CIDR. | ||
clusterIPAllocated = metrics.NewGaugeVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these not part of NewAllocatorCIDRRange()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments included. Thank you for this.
/lgtm |
/approve |
/assign @logicalhan |
We should add our own ownership in the sub-dir |
Thanks! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, khenidak, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/kind cleanup
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #104093