Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: fix rangefeed retry reason counters #133991

Merged

Conversation

stevendanna
Copy link
Collaborator

Previously, retriable errors would often be recorded in the wrong
retry counter because the code that initialised the metrics assumed a
stable ordering when iterating a map.

Given the amount of work that is about to happen when restarting a
rangefeed, I doubt the optimization of a slice lookup vs a map lookup
is that important here.

Further, we almost surely do not want to panic just because we
couldn't find a metric for the reason a node might have sent to us, so
now rather than panic'ing we have a fallback counter for unknown error
types.

Epic: none

Release note: Fix bug that could result in incorrect metrics related
to retriable rangefeed errors.

This allows us to have a map of metrics in a struct.

Epic: none
Release note: None
Previously, retriable errors would often be recorded in the wrong
retry counter because the code that initialised the metrics assumed a
stable ordering when iterating a map.

Given the amount of work that is about to happen when restarting a
rangefeed, I doubt the optimization of a slice lookup vs a map lookup
is that important here.

Further, we almost surely do not want to panic just because we
couldn't find a metric for the reason a node might have sent to us, so
now rather than panic'ing we have a fallback counter for unknown error
types.

Epic: none

Release note: Fix bug that could result in incorrect metrics related
to retriable rangefeed errors.
@stevendanna stevendanna requested review from a team as code owners October 31, 2024 18:09
@stevendanna stevendanna requested review from dhartunian, arjunmahishi and aa-joshi and removed request for a team October 31, 2024 18:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna requested review from msbutler and wenyihu6 and removed request for a team, dhartunian, arjunmahishi and aa-joshi October 31, 2024 18:23
Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rangefeed fix LGTM. I'm not too familiar with registry metrics package - I gave it a few reads and it seems good overall. I ran a roachtest and confirmed that all retry metrics are populated just in case.

Screenshot 2024-10-31 at 6 41 20 PM

@wenyihu6 wenyihu6 added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 labels Oct 31, 2024
@stevendanna
Copy link
Collaborator Author

bors r+

@craig craig bot merged commit a39ae7d into cockroachdb:master Nov 1, 2024
22 of 23 checks passed
Copy link

blathers-crl bot commented Nov 1, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from d772ac5 to blathers/backport-release-23.2-133991: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


error creating merge commit from d772ac5 to blathers/backport-release-24.1-133991: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants