Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only metrics #2557

Merged
merged 2 commits into from
Sep 13, 2023
Merged

Only metrics #2557

merged 2 commits into from
Sep 13, 2023

Conversation

lnhanks
Copy link
Contributor

@lnhanks lnhanks commented Sep 11, 2023

What type of PR is this?

Which issue does this PR fix:
n/a

What does this PR do / Why do we need it:
This PR adds to additional metrics to better visualize IP allocation. No available addresses error counter metric is currently a logged error but turning it into a metric will help visualize how often an IP allocation fails due to there being no addresses available. The second gauge metric ENI utilization expands on the existing ENIs allocated metric by partitioning data by ENI id and counting how many IP addresses are in use on each ENI. There are also two additional log statements created when an IP address is allocated or deallocated.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:

Automation added to e2e:

No

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Yes

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

Adds to additional metrics to better visualize IP allocation. No available addresses error counter metric is currently a logged error but turning it into a metric will help visualize how often an IP allocation fails due to there being no addresses available. The second gauge metric ENI utilization expands on the existing ENIs allocated metric by partitioning data by ENI id and counting how many IP addresses are in use on each ENI. 

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
@lnhanks lnhanks force-pushed the only-metrics branch 2 times, most recently from cb8b7d8 to 1362702 Compare September 12, 2023 19:09
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
jdn5126
jdn5126 previously approved these changes Sep 12, 2023
Copy link
Contributor

@jdn5126 jdn5126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, thanks @lnhanks !

@jdn5126 jdn5126 merged commit 2965ddf into aws:master Sep 13, 2023
4 checks passed
)
eniIPsInUse = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "awscni_eni_util",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this metric should be named awscni_assigned_ip_per_eni, since it serves similar purpose of awscni_assigned_ip_per_cidr metric(expect the label value is eni instead of cidr).
Also, util is not a good abbreviation for "utilization"( i assume you mean utilization here :D)

@@ -122,6 +122,19 @@ var (
},
[]string{"cidr"},
)
noAvailableIPAddrs = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "awscni_err_no_avail_addrs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change to awscni_no_available_ip_addresses, which

  1. aligns with other metrics's use of "ip_addresses".
  2. and we don't need to have this "err" in metrics name, since it's a expected behavior instead of err.
  3. remove nonstandard use of abbreviation(avail is not a standard abbreviation for available iirc)

jdn5126 added a commit that referenced this pull request Oct 20, 2023
* restore node update permission to master until image tag can be updated (#2513)

* Merge branch 'release-1.14' (#2517)

* network policies update to readme (#2478)

* init draft of network policy desc

* add security note

* fixup

* fixup

* fix placeholder link

* Update manifest for cni 1.14 (#2526)

* Mimic VPC-RC limit struture (#2516)

* limits api pkg (#2528)

* Update kops tests for 1.28 and fix generate-cni-yaml script (#2536)

* skip IPAMD events test (#2537)

* chore: remove refs to deprecated io/ioutil (#2541)

* Change default Node Agent ports for health and metrics (#2545)

* remove self-managed node group from pod-eni test suite (#2547)

* bump controller runtime to 0.16.1 (#2548)

Co-authored-by: Joseph Chen <[email protected]>

* update agent image (#2554)

* fix(chart): Switch base64 encoded cniConfig.fileContents to the binaryData (#2552)

* Update the use of privileged flag in aws-vpc-cni manifest (#2555)

* increment default Calico version for helm compatibility (#2560)

* update nginx image (#2561)

* Only metrics (#2557)

Prometheus metrics for capturing ENI IP usage and no available IP address errors

Co-authored-by: Lindsay Hanks <[email protected]>

* CHANGELOG, chart, and manifest updates for VPC CNI v1.15.0 release (#2563)

* remove calico test suite from weekly integration tests (#2559)

* remove addon-tests integration suite as it is no longer needed (#2564)

* Only metrics (#2569)

* rename warm pool metrics

---------

Co-authored-by: Lindsay Hanks <[email protected]>

* Fix unused version variable (#2566)

* Update example table 'Pod per Prefixes' value (#2573)

* Bandwidth plugin with NP is currently unsupported (#2572)

* Bandwidth plugin with NP

* Messaging review

* pass CNINode scheme to client only (#2570)

* reduce api calls (#2575)

* Add region flag to describe-addon command (#2576)

* add ENABLE_V4_EGRESS (#2577)

* Add test registry parameter for ipv6 and CNI full tests (#2585)

* update golang image (#2586)

* increase time for service readiness (#2587)

* do not patch CNINode for custom networking unless podENI is enabled (#2591)

* Remove self-managed node group from custom-networking suite (#2590)

* remove self-managed node group from custom-networking suite

* Select CNI manifest based on regions (#2593)

* Update metrics helper image url based on region (#2604)

* dependabot updates (#2605)

* Graceful termination for service connectivity tests (#2611)

* update CHANGELOG, charts, and manifests in master following v1.15.1 release (#2614)

* go module updates and golang builder image update (#2615)

* update Golang to 1.21.3 (#2616)

* Stricter dependency/security review (#2617)

* Stricter dependency/security review

Signed-off-by: Davanum Srinivas <[email protected]>

* move common things to a separate file

Signed-off-by: Davanum Srinivas <[email protected]>

---------

Signed-off-by: Davanum Srinivas <[email protected]>

* update actions for go 1.21 and fix deps action warnings (#2618)

---------

Signed-off-by: Davanum Srinivas <[email protected]>
Co-authored-by: Jay Deokar <[email protected]>
Co-authored-by: Geoffrey Cline <[email protected]>
Co-authored-by: Joseph Chen <[email protected]>
Co-authored-by: guangwu <[email protected]>
Co-authored-by: Joseph Chen <[email protected]>
Co-authored-by: Valentin Zayash <[email protected]>
Co-authored-by: lnhanks <[email protected]>
Co-authored-by: Lindsay Hanks <[email protected]>
Co-authored-by: 김은빈 <[email protected]>
Co-authored-by: Jayanth Varavani <[email protected]>
Co-authored-by: Davanum Srinivas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants