Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace server_port_subscribers metric #11206

Merged
merged 1 commit into from
Aug 8, 2023
Merged

Replace server_port_subscribers metric #11206

merged 1 commit into from
Aug 8, 2023

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Aug 4, 2023

Fixes #10764

Removed the server_port_subscribers gauge, as it wasn't distiguishing amongst different pods, and the number of subscribers for each pod were conflicting with one another when updating the metric (see more details here).

Besides carying an invalid value, this was generating the warning unable to delete server_port_subscribers metric with labels

The metric was replaced with the server_port_subscribes and server_port_unsubscribes counters, which track the overall number of subscribes and unsubscribes to the particular pod port.

🌮 to @adleong for the diagnosis and the fix!

Fixes #10764

Removed the `server_port_subscribers` gauge, as it wasn't distiguishing
amongst different pods, and the number of subscribers for each pod were
conflicting with one another when updating the metric (see more details
[here](#10764 (comment))).

Besides carying an invalid value, this was generating the warning
`unable to delete server_port_subscribers metric with labels`

The metric was replaced with the `server_port_subscribes` and
`server_port_unsubscribes` counters, which track the overall number of
subscribes and unsubscribes to the particular pod port.

:taco: to @adleong for the diagnosis and the fix!
@alpeb alpeb requested a review from a team as a code owner August 4, 2023 20:59
@alpeb alpeb merged commit 69ecb7f into main Aug 8, 2023
@alpeb alpeb deleted the alpeb/stolen-from-alex branch August 8, 2023 16:28
@hawkw hawkw added this to the stable-2.13.6 milestone Aug 8, 2023
hawkw pushed a commit that referenced this pull request Aug 9, 2023
Fixes #10764

Removed the `server_port_subscribers` gauge, as it wasn't distiguishing
amongst different pods, and the number of subscribers for each pod were
conflicting with one another when updating the metric (see more details
[here](#10764 (comment))).

Besides carying an invalid value, this was generating the warning
`unable to delete server_port_subscribers metric with labels`

The metric was replaced with the `server_port_subscribes` and
`server_port_unsubscribes` counters, which track the overall number of
subscribes and unsubscribes to the particular pod port.

:taco: to @adleong for the diagnosis and the fix!
hawkw added a commit that referenced this pull request Aug 9, 2023
This stable release fixes a regression introduced in stable-2.13.0 which
resulted in proxies shedding load too aggressively while under moderate
request load to a single service ([#11055]). In addition, it updates the
base image for the `linkerd-cni` initcontainer to resolve a CVE in
`libdb` ([#11196]), fixes a race condition in the Destination controller
that could cause it to crash ([#11163]), as well as fixing a number of
other issues.

* Control Plane
  * Fixed a race condition in the destination controller that could
    cause it to panic ([#11169]; fixes [#11193])
  * Improved the granularity of logging levels in the control plane
    ([#11147])
  * Replaced incorrect `server_port_subscribers` gauge in the
    Destination controller's metrics with `server_port_subscribes` and
    `server_port_unsubscribes` counters ([#11206]; fixes [#10764])

* Proxy
  * Changed the default HTTP request queue capacities for the inbound
    and outbound proxies back to 10,000 requests ([#11198]; fixes
    [#11055])

* CLI
  * Updated extension CLI commands to prefer the `--registry` flag over
    the `LINKERD_DOCKER_REGISTRY` environment variable, making the
    precedence more consistent (thanks @harsh020!) (see [#11144])

* CNI
  * Updated `linkerd-cni` base image to resolve [CVE-2019-8457] in
    `libdb` ([#11196])
  * Changed the CNI plugin installer to always run in 'chained' mode;
    the plugin will now wait until another CNI plugin is installed
    before appending its configuration ([#10849])
  * Removed `hostNetwork: true` from linkerd-cni Helm chart templates
    ([#11158]; fixes [#11141]) (thanks @abhijeetgauravm!)

* Multicluster
  * Fixed the `linkerd multicluster check` command failing in the
    presence of lots of mirrored services ([#10764])

[#10764]: #10764
[#10849]: #10849
[#11055]: #11055
[#11141]: #11141
[#11144]: #11144
[#11147]: #11147
[#11158]: #11158
[#11163]: #11163
[#11169]: #11169
[#11196]: #11196
[#11198]: #11198
[#11206]: #11206
[CVE-2019-8457]: https://avd.aquasec.com/nvd/2019/cve-2019-8457/
@hawkw hawkw mentioned this pull request Aug 9, 2023
hawkw added a commit that referenced this pull request Aug 9, 2023
This stable release fixes a regression introduced in stable-2.13.0 which
resulted in proxies shedding load too aggressively while under moderate
request load to a single service ([#11055]). In addition, it updates the
base image for the `linkerd-cni` initcontainer to resolve a CVE in
`libdb` ([#11196]), fixes a race condition in the Destination controller
that could cause it to crash ([#11163]), as well as fixing a number of
other issues.

* Control Plane
  * Fixed a race condition in the destination controller that could
    cause it to panic ([#11169]; fixes [#11193])
  * Improved the granularity of logging levels in the control plane
    ([#11147])
  * Replaced incorrect `server_port_subscribers` gauge in the
    Destination controller's metrics with `server_port_subscribes` and
    `server_port_unsubscribes` counters ([#11206]; fixes [#10764])

* Proxy
  * Changed the default HTTP request queue capacities for the inbound
    and outbound proxies back to 10,000 requests ([#11198]; fixes
    [#11055])

* CLI
  * Updated extension CLI commands to prefer the `--registry` flag over
    the `LINKERD_DOCKER_REGISTRY` environment variable, making the
    precedence more consistent (thanks @harsh020!) (see [#11144])

* CNI
  * Updated `linkerd-cni` base image to resolve [CVE-2019-8457] in
    `libdb` ([#11196])
  * Changed the CNI plugin installer to always run in 'chained' mode;
    the plugin will now wait until another CNI plugin is installed
    before appending its configuration ([#10849])
  * Removed `hostNetwork: true` from linkerd-cni Helm chart templates
    ([#11158]; fixes [#11141]) (thanks @abhijeetgauravm!)

* Multicluster
  * Fixed the `linkerd multicluster check` command failing in the
    presence of lots of mirrored services ([#10764])

[#10764]: #10764
[#10849]: #10849
[#11055]: #11055
[#11141]: #11141
[#11144]: #11144
[#11147]: #11147
[#11158]: #11158
[#11163]: #11163
[#11169]: #11169
[#11196]: #11196
[#11198]: #11198
[#11206]: #11206
[CVE-2019-8457]: https://avd.aquasec.com/nvd/2019/cve-2019-8457/
hawkw added a commit that referenced this pull request Aug 11, 2023
## edge-23.8.2

This edge release adds improvements to Linkerd's multi-cluster features
as part of the [flat network support] planned for Linkerd stable-2.14.0.
In addition, it fixes an issue ([#10764]) where warnings about an
invalid metric were logged frequently by the Destination controller.

* Added a new `remoteDiscoverySelector` field to the multicluster `Link`
  CRD, which enables a service mirroring mod where the control plane
  performs discovery for the mirrored service from the remote cluster,
  rather than creating Endpoints for the mirrored service in the source
  cluster ([#11190], [#11201], [#11220], and [#11224])
* Fixed missing "Services" menu item in the Spanish localization for the
  `linkerd-viz` web dashboard ([#11229]) (thanks @mclavel!)
* Replaced `server_port_subscribers` Destination controller gauge metric
  with `server_port_subscribes` and `server_port_unsubscribes` counter
  metrics ([#11206]; fixes [#10764])
* Replaced deprecated `failure-domain.beta.kubernetes.io` labels in Helm
  charts with `topology.kubernetes.io` labels ([#11148]; fixes [#11114])
  (thanks @piyushsingariya!)

[#10764]: #10764
[#11114]: #11114
[#11148]: #11148
[#11190]: #11190
[#11201]: #11201
[#11206]: #11206
[#11220]: #11220
[#11224]: #11224
[#11229]: #11229
[flat network support]:
    https://linkerd.io/2023/07/20/enterprise-multi-cluster-at-scale-supporting-flat-networks-in-linkerd/
@hawkw hawkw mentioned this pull request Aug 11, 2023
hawkw added a commit that referenced this pull request Aug 11, 2023
## edge-23.8.2

This edge release adds improvements to Linkerd's multi-cluster features
as part of the [flat network support] planned for Linkerd stable-2.14.0.
In addition, it fixes an issue ([#10764]) where warnings about an
invalid metric were logged frequently by the Destination controller.

* Added a new `remoteDiscoverySelector` field to the multicluster `Link`
  CRD, which enables a service mirroring mode where the control plane
  performs discovery for the mirrored service from the remote cluster,
  rather than creating Endpoints for the mirrored service in the source
  cluster ([#11190], [#11201], [#11220], and [#11224])
* Fixed missing "Services" menu item in the Spanish localization for the
  `linkerd-viz` web dashboard ([#11229]) (thanks @mclavel!)
* Replaced `server_port_subscribers` Destination controller gauge metric
  with `server_port_subscribes` and `server_port_unsubscribes` counter
  metrics ([#11206]; fixes [#10764])
* Replaced deprecated `failure-domain.beta.kubernetes.io` labels in Helm
  charts with `topology.kubernetes.io` labels ([#11148]; fixes [#11114])
  (thanks @piyushsingariya!)

[#10764]: #10764
[#11114]: #11114
[#11148]: #11148
[#11190]: #11190
[#11201]: #11201
[#11206]: #11206
[#11220]: #11220
[#11224]: #11224
[#11229]: #11229
[flat network support]:
    https://linkerd.io/2023/07/20/enterprise-multi-cluster-at-scale-supporting-flat-networks-in-linkerd/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unable to delete server_port_subscribers metric with labels
4 participants