-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: track snapshot bytes due to rebalancing #81047
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
E-starter
Might be suitable for a starter project for new employees or team members.
T-kv
KV Team
Comments
kvoli
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv
Anything in KV that doesn't belong in a more specific category.
labels
May 5, 2022
AlexTalks
added
the
E-starter
Might be suitable for a starter project for new employees or team members.
label
May 13, 2022
KnightAsterial
added a commit
to KnightAsterial/cockroach
that referenced
this issue
May 25, 2022
Resolves cockroachdb#81047 This commit implements new metrics to track the bytes sent/received by each type of snapshot (rebalance, recovery, unknown). Additionally, it changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`.
KnightAsterial
added a commit
to KnightAsterial/cockroach
that referenced
this issue
May 26, 2022
Resolves cockroachdb#81047 This commit implements new metrics to track the bytes sent/received by each type of snapshot (rebalance, recovery, unknown). Additionally, it changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`.
KnightAsterial
added a commit
to KnightAsterial/cockroach
that referenced
this issue
May 31, 2022
This commit adds the following new metrics - `range.snapshots.unknown.rcvd-bytes` - `range.snapshots.unknown.sent-bytes` - `range.snapshots.rebalancing.rcvd-bytes` - `range.snapshots.rebalancing.sent-bytes` - `range.snapshots.recovery.rcvd-bytes` - `range.snapshots.recovery.sent-bytes` These metrics tracks the bytes send/received for each type of snapshot (rebalance, recovery, and unknown snapshots). The sum of these three new metrics should equal the existing `range.snapshots.(sent|rcvd)-bytes` that tracks the total number of snapshot bytes sent and received. Additionally, this commit changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter` Resolves cockroachdb#81047 Release Note: None
KnightAsterial
added a commit
to KnightAsterial/cockroach
that referenced
this issue
May 31, 2022
This commit adds the following new metrics - `range.snapshots.unknown.rcvd-bytes` - `range.snapshots.unknown.sent-bytes` - `range.snapshots.rebalancing.rcvd-bytes` - `range.snapshots.rebalancing.sent-bytes` - `range.snapshots.recovery.rcvd-bytes` - `range.snapshots.recovery.sent-bytes` These metrics tracks the bytes send/received for each type of snapshot (rebalance, recovery, and unknown snapshots). The sum of these three new metrics should equal the existing `range.snapshots.(sent|rcvd)-bytes` that tracks the total number of snapshot bytes sent and received. Additionally, this commit changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter` Finally, this commit adds a new SendSnapshot TestingKnob that hooks into the send snapshot flow after a DelegateSnapshotRequest is handled but before any throttling or sending logic takes place. Resolves cockroachdb#81047 Release Note: None
KnightAsterial
added a commit
to KnightAsterial/cockroach
that referenced
this issue
May 31, 2022
This commit adds the following new metrics - `range.snapshots.unknown.rcvd-bytes` - `range.snapshots.unknown.sent-bytes` - `range.snapshots.rebalancing.rcvd-bytes` - `range.snapshots.rebalancing.sent-bytes` - `range.snapshots.recovery.rcvd-bytes` - `range.snapshots.recovery.sent-bytes` These metrics tracks the bytes send/received for each type of snapshot (rebalance, recovery, and unknown snapshots). The sum of these three new metrics should equal the existing `range.snapshots.(sent|rcvd)-bytes` that tracks the total number of snapshot bytes sent and received. Additionally, this commit changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter` Finally, this commit adds a new SendSnapshot TestingKnob that hooks into the send snapshot flow after a DelegateSnapshotRequest is handled but before any throttling or sending logic takes place. Resolves cockroachdb#81047 Release note (ui change): Added new 6 metrics (range.snapshots.shapshots.(unknown|recovery|rebalancing).sent-bytes and range.snapshots.shapshots.(unknown|recovery|rebalancing).rcvd-bytes) to the metrics dashboard. This will allow users to track the number of bytes sent/received for each type of metric in addition to the total bytes sent/received.
KnightAsterial
added a commit
to KnightAsterial/cockroach
that referenced
this issue
Jun 1, 2022
This commit adds the following new metrics - `range.snapshots.unknown.rcvd-bytes` - `range.snapshots.unknown.sent-bytes` - `range.snapshots.rebalancing.rcvd-bytes` - `range.snapshots.rebalancing.sent-bytes` - `range.snapshots.recovery.rcvd-bytes` - `range.snapshots.recovery.sent-bytes` These metrics tracks the bytes send/received for each type of snapshot (rebalance, recovery, and unknown snapshots). The sum of these three new metrics should equal the existing `range.snapshots.(sent|rcvd)-bytes` that tracks the total number of snapshot bytes sent and received. Additionally, this commit changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`. Finally, this commit adds a new SendSnapshot TestingKnob that hooks into the send snapshot flow after a DelegateSnapshotRequest is handled but before any throttling or sending logic takes place. Resolves cockroachdb#81047 Release note (ops change): Added new 6 metrics (range.snapshots.shapshots.(unknown|recovery|rebalancing).sent-bytes and range.snapshots.shapshots.(unknown|recovery|rebalancing).rcvd-bytes) to the metrics dashboard. This will allow users to track the number of bytes sent/received for each type of metric in addition to the total bytes sent/received.
craig bot
pushed a commit
that referenced
this issue
Jun 1, 2022
79090: kv: alter error for SET TRANSACTION AS OF SYSTEM TIME r=e-mbrown a=e-mbrown if reads or writes are already performed Resolves #77265 When a txn performed a read or write before setting up a historical timestamp a crash report was generated. This commit handles the error case. Release note: None 81860: kvserver: implement granular metrics for snapshot recovery and rebalance r=KnightAsterial a=KnightAsterial This commit adds the following new metrics - `range.snapshots.unknown.rcvd-bytes` - `range.snapshots.unknown.sent-bytes` - `range.snapshots.rebalancing.rcvd-bytes` - `range.snapshots.rebalancing.sent-bytes` - `range.snapshots.recovery.rcvd-bytes` - `range.snapshots.recovery.sent-bytes` These metrics tracks the bytes send/received for each type of snapshot (rebalance, recovery, and unknown snapshots). The sum of these three new metrics should equal the existing `range.snapshots.(sent|rcvd)-bytes` that tracks the total number of snapshot bytes sent and received. Additionally, this commit changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`. Finally, this commit adds a new SendSnapshot TestingKnob that hooks into the send snapshot flow after a DelegateSnapshotRequest is handled but before any throttling or sending logic takes place. Resolves #81047 Release note (ops change): Added new 6 metrics (`range.snapshots.shapshots.(unknown|recovery|rebalancing).sent-bytes` and `range.snapshots.shapshots.(unknown|recovery|rebalancing).rcvd-bytes`) to the metrics dashboard. This will allow users to track the number of bytes sent/received for each type of metric in addition to the total bytes sent/received. 82277: authors: add surajr10 to authors r=surajr10 a=surajr10 Release note: None 82296: tests: update version check for tenant scoping r=dhartunian a=rimadeodhar This PR fixes the version gate check to add tenant scoping while creating client certs for tenant roachtests. It is erroneously gated on v22.1 when it should be v22.2. Release note: None Co-authored-by: e-mbrown <[email protected]> Co-authored-by: Ryan Zhao <[email protected]> Co-authored-by: Suraj <[email protected]> Co-authored-by: rimadeodhar <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
E-starter
Might be suitable for a starter project for new employees or team members.
T-kv
KV Team
We currently track the number of bytes received and bytes sent for snapshots, per store.
Snapshots occur for multiple reasons, we would like to gain insight into the portion attributable to rebalancing decisions, up-replication and "catch up snapshots" when a range is behind on it's log.
The solution is to track the cause of the snapshot and export the sent/recv bytes and count for each type of snapshot cause.
We update the bytes sent here:
cockroach/pkg/kv/kvserver/store_snapshot.go
Line 402 in 1c41988
We update the bytes received here:
cockroach/pkg/kv/kvserver/store_snapshot.go
Line 291 in 1c41988
We track the snapshot type that is being sent, differentiating between up-replication/general raft and rebalancing:
cockroach/pkg/kv/kvserver/kvserverpb/raft.proto
Lines 125 to 133 in 1c41988
We should create two new counters:
range.snapshots.rebalancing.rcvd-bytes
range.snapshots.rebalancing.sent-bytes
Then increment these counters in a similar fashion to the currently tracked snapshot bytes:
cockroach/pkg/kv/kvserver/metrics.go
Lines 525 to 529 in 1c41988
The currently exported metrics for range snapshots are show in the screenshot below:
Additional Information:
Separate to this issue is to begin tracking the up-front range size when considering rebalancing. The snapshot bytes sent/recv due to rebalancing actions would be known after rebalancing has occurred. As such, it is separate from this issue and may be estimated using the range size up-front, with consideration for this metric as feedback.
Jira issue: CRDB-15374
The text was updated successfully, but these errors were encountered: