Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterIP Allocator metrics #104119

Merged
merged 1 commit into from Aug 14, 2021
Merged

Conversation

aojea
Copy link
Member

@aojea aojea commented Aug 4, 2021

/kind cleanup

What this PR does / why we need it:

add clusterIP allocator metrics

Add 4 new metrics to the ClusterIP allocators:
- current number of available IPs per Service CIDR
- current number of used IPs per Service CIDR
- total number of allocation per Service CIDR
- total number of allocation errors per ServiceCIDR
- 

Which issue(s) this PR fixes:

Fixes #104093

The apiserver exposes 4 new metrics that allow to track the status of the Service CIDRs allocations:
    - current number of available IPs per Service CIDR
    - current number of used IPs per Service CIDR
    - total number of allocation per Service CIDR
    - total number of allocation errors per ServiceCIDR

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 4, 2021
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 4, 2021
@aojea aojea force-pushed the clusterip_metrics branch 3 times, most recently from f255f51 to 7d81b00 Compare August 4, 2021 10:08
@aojea
Copy link
Member Author

aojea commented Aug 4, 2021

/sig network
/assign @thockin @khenidak

@aojea aojea force-pushed the clusterip_metrics branch 3 times, most recently from 94d47dc to 09ea3bf Compare August 4, 2021 11:14
Add 4 new metrics to the ClusterIP allocators:
- current number of available IPs per Service CIDR
- current number of used IPs per Service CIDR
- total number of allocation per Service CIDR
- total number of allocation errors per ServiceCIDR
@aojea
Copy link
Member Author

aojea commented Aug 4, 2021

@dcbw @danwinship you may be interested too

ok, offset := r.contains(ip)
if !ok {
// update metrics
Copy link
Contributor

@khenidak khenidak Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we tracking user error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metrics is allocation_errors_total , I didn't want to discriminate on the error type.
I don't expect users to use this metric to narrow down problems, they should go to the apiserver audit.log for that, instead, I expect this to be a measure of the allocation process health, as in "hey, you are having error allocating ips, you should take a look on what is going on here"

@@ -165,31 +167,58 @@ func (r *Range) CIDR() net.IPNet {
// or has already been reserved. ErrFull will be returned if there
// are no addresses left.
func (r *Range) Allocate(ip net.IP) error {
label := r.CIDR()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not doing a simple:

// elevate allocated declaration 
var allocated bool
var err error
defer reportMetric(r.cidr, err)

// create a new reportMetric func and use it from both Allocate(), and AllocateNext() 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started with that, but then I've added errors, and reportMetric is not covering all metrics (errors), so I went back to add each metrics manually


var (
// clusterIPAllocated indicates the amount of cluster IP allocated by Service CIDR.
clusterIPAllocated = metrics.NewGaugeVec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these not part of NewAllocatorCIDRRange()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@khenidak khenidak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments included. Thank you for this.

@khenidak
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2021
@khenidak
Copy link
Contributor

/approve

@aojea
Copy link
Member Author

aojea commented Aug 10, 2021

/approve

we need someone with more superpowers 😄
Wojtek and David are in the reliability-wg, they may find this useful and they are owners here

/assign @wojtek-t @deads2k

@dashpole
Copy link
Contributor

/assign @logicalhan
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 12, 2021
@thockin
Copy link
Member

thockin commented Aug 13, 2021

We should add our own ownership in the sub-dir

@thockin
Copy link
Member

thockin commented Aug 14, 2021

Thanks!

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, khenidak, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2021
@k8s-ci-robot k8s-ci-robot merged commit e95983b into kubernetes:master Aug 14, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics to apiserver tracking the service IP allocator
8 participants