Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics and type-saftey assertions about the raft indexes in delegate snapshot requests #98243

Closed
andrewbaptist opened this issue Mar 8, 2023 · 2 comments
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team

Comments

@andrewbaptist
Copy link
Collaborator

andrewbaptist commented Mar 8, 2023

The delegated snapshot requests has uint64 fields for the term and first_index. While these have been manually audited to check for correctness, it would be better to type these to prevent misusing a field.

Add a metric on "snapshots that were received but unusable" as part of this effort. These are snapshots that are transferred and then the leader is unable to update the follower based on the raft log.

These changes will validate the correctness and prevent any changes in the future from using an incorrect index.

Jira issue: CRDB-25147

@andrewbaptist andrewbaptist added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker labels Mar 8, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 8, 2023

Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@andrewbaptist andrewbaptist added the branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 label Mar 8, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 9, 2023
@andrewbaptist andrewbaptist self-assigned this Mar 23, 2023
@andrewbaptist andrewbaptist changed the title Add type-saftey assertions about the raft indexes in delegate snapshot requests Add metrics and type-saftey assertions about the raft indexes in delegate snapshot requests Apr 7, 2023
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 7, 2023
Fixes: cockroachdb#98243
This PR adds two new stats for delegate snapshots to track failure of
sending snapshots. There are failures either before data is transferred
or after the snapshot is received. Both stats are useful.

Epic: none

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 17, 2023
Fixes: cockroachdb#98243
This PR adds two new stats for delegate snapshots to track failure of
sending snapshots. There are failures either before data is transferred
or after the snapshot is received.

Epic: none

Release note:
This commit adds two new stats which are useful for tracking the
efficiency of snapshot transfers. Some snapshots will always fail due to
system level "races", but the goal is to keep it as low as possible.
range.snapshots.recv-failed - The number of snapshot send attempts that
are initiated but not accepted by the recipient.
range.snapshots.recv-unusable - The number of snapshots that were fully
transmitted but not used.
@andrewbaptist
Copy link
Collaborator Author

Added stats as part of #100762. Will merge the larger refactor of strict types in #100181.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 19, 2023
Fixes: cockroachdb#98243
This PR adds two new stats for delegate snapshots to track failure of
sending snapshots. There are failures either before data is transferred
or after the snapshot is received.

Epic: none

Release note:
This commit adds two new stats which are useful for tracking the
efficiency of snapshot transfers. Some snapshots will always fail due to
system level "races", but the goal is to keep it as low as possible.
range.snapshots.recv-failed - The number of snapshot send attempts that
are initiated but not accepted by the recipient.
range.snapshots.recv-unusable - The number of snapshots that were fully
transmitted but not used.
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 19, 2023
Fixes: cockroachdb#98243
This PR adds two new stats for delegate snapshots to track failure of
sending snapshots. There are failures either before data is transferred
or after the snapshot is received.

Epic: none

Release note:
This commit adds two new stats which are useful for tracking the
efficiency of snapshot transfers. Some snapshots will always fail due to
system level "races", but the goal is to keep it as low as possible.
range.snapshots.recv-failed - The number of snapshot send attempts that
are initiated but not accepted by the recipient.
range.snapshots.recv-unusable - The number of snapshots that were fully
transmitted but not used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team
Projects
None yet
Development

No branches or pull requests

1 participant