Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: kv: Add stats for delegate snapshots #101837

Merged

Conversation

andrewbaptist
Copy link
Collaborator

Backport:

1/1 commits from "kv: unflake TestDelegateSnapshot" (#99169)
1/1 commits from "kv: Add stats for delegate snapshots" (#100762)
Please see individual PRs for details.

/cc https://github.com/orgs/cockroachdb/teams/release

Fixes: #98243
This PR adds two new stats for delegate snapshots to track failure of
sending snapshots. There are failures either before data is transferred
or after the snapshot is received.

Epic: none

Release note:
This commit adds two new stats which are useful for tracking the
efficiency of snapshot transfers. Some snapshots will always fail due to
system level "races", but the goal is to keep it as low as possible.
range.snapshots.recv-failed - The number of snapshot send attempts that
are initiated but not accepted by the recipient.
range.snapshots.recv-unusable - The number of snapshots that were fully
transmitted but not used.

Release justification: Adds stats as discussed as part of the premortem meeting.

Fixes: cockroachdb#96841
Fixes: cockroachdb#96525

Previously this test would assume that all snapshots came from the
sending of snapshots through the AdminChangeReplicasRequest which end up
as type OTHER. However occassionally we get a spurious raft snapshot
which makes this test flaky. This change ignores any raft snapshots that
are sent.

Epic: none
Release note: None
Fixes: cockroachdb#98243
This PR adds two new stats for delegate snapshots to track failure of
sending snapshots. There are failures either before data is transferred
or after the snapshot is received.

Epic: none

Release note:
This commit adds two new stats which are useful for tracking the
efficiency of snapshot transfers. Some snapshots will always fail due to
system level "races", but the goal is to keep it as low as possible.
range.snapshots.recv-failed - The number of snapshot send attempts that
are initiated but not accepted by the recipient.
range.snapshots.recv-unusable - The number of snapshots that were fully
transmitted but not used.
@andrewbaptist andrewbaptist requested review from a team April 19, 2023 14:32
@andrewbaptist andrewbaptist requested a review from a team as a code owner April 19, 2023 14:32
@blathers-crl
Copy link

blathers-crl bot commented Apr 19, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@andrewbaptist andrewbaptist changed the title release-23.1: TODO release-23.1: kv: Add stats for delegate snapshots Apr 19, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist requested a review from kvoli April 19, 2023 14:32
@dhartunian dhartunian removed the request for review from a team April 24, 2023 14:32
@andrewbaptist andrewbaptist merged commit 515851a into cockroachdb:release-23.1 Apr 25, 2023
@andrewbaptist andrewbaptist deleted the backport23.1-99169-100762 branch April 25, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants