Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: track snapshot bytes due to rebalancing #81047

Closed
kvoli opened this issue May 5, 2022 · 0 comments · Fixed by #81860
Closed

kvserver: track snapshot bytes due to rebalancing #81047

kvoli opened this issue May 5, 2022 · 0 comments · Fixed by #81860
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-starter Might be suitable for a starter project for new employees or team members. T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented May 5, 2022

We currently track the number of bytes received and bytes sent for snapshots, per store.

Snapshots occur for multiple reasons, we would like to gain insight into the portion attributable to rebalancing decisions, up-replication and "catch up snapshots" when a range is behind on it's log.

The solution is to track the cause of the snapshot and export the sent/recv bytes and count for each type of snapshot cause.

We update the bytes sent here:

bytesSentMetric.Inc(int64(b.Len()))

We update the bytes received here:

bytesRcvdCounter.Inc(int64(len(req.KVBatch)))

We track the snapshot type that is being sent, differentiating between up-replication/general raft and rebalancing:

enum Priority {
UNKNOWN = 0;
// RECOVERY is used for a Raft-initiated snapshots and for
// up-replication snapshots (i.e. when a dead node has been
// removed and the range needs to be up-replicated).
RECOVERY = 1;
// REBALANCE is used for snapshots involved in rebalancing.
REBALANCE = 2;
}

We should create two new counters:
range.snapshots.rebalancing.rcvd-bytes
range.snapshots.rebalancing.sent-bytes

Then increment these counters in a similar fashion to the currently tracked snapshot bytes:

Name: "range.snapshots.rcvd-bytes",
Help: "Number of snapshot bytes received",
Measurement: "Bytes",
Unit: metric.Unit_COUNT,
}

The currently exported metrics for range snapshots are show in the screenshot below:

image

Additional Information:

Separate to this issue is to begin tracking the up-front range size when considering rebalancing. The snapshot bytes sent/recv due to rebalancing actions would be known after rebalancing has occurred. As such, it is separate from this issue and may be estimated using the range size up-front, with consideration for this metric as feedback.

Jira issue: CRDB-15374

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv Anything in KV that doesn't belong in a more specific category. labels May 5, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label May 5, 2022
@lidorcarmel lidorcarmel added the A-kv-distribution Relating to rebalancing and leasing. label May 6, 2022
@kvoli kvoli self-assigned this May 9, 2022
@AlexTalks AlexTalks added the E-starter Might be suitable for a starter project for new employees or team members. label May 13, 2022
@mwang1026 mwang1026 assigned KnightAsterial and unassigned kvoli May 17, 2022
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
KnightAsterial added a commit to KnightAsterial/cockroach that referenced this issue May 25, 2022
Resolves cockroachdb#81047
This commit implements new metrics to track the bytes sent/received by each type of snapshot (rebalance, recovery, unknown). Additionally, it changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`.
KnightAsterial added a commit to KnightAsterial/cockroach that referenced this issue May 26, 2022
Resolves cockroachdb#81047
This commit implements new metrics to track the bytes sent/received by each type of snapshot (rebalance, recovery, unknown). Additionally, it changes the `snapshotStrategy` interface such that the `Receive` and `Send` methods take `snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`.
KnightAsterial added a commit to KnightAsterial/cockroach that referenced this issue May 31, 2022
This commit adds the following new metrics
- `range.snapshots.unknown.rcvd-bytes`
- `range.snapshots.unknown.sent-bytes`
- `range.snapshots.rebalancing.rcvd-bytes`
- `range.snapshots.rebalancing.sent-bytes`
- `range.snapshots.recovery.rcvd-bytes`
- `range.snapshots.recovery.sent-bytes`
These metrics tracks the bytes send/received for each type of
snapshot (rebalance, recovery, and unknown snapshots).
The sum of these three new metrics should equal the existing
`range.snapshots.(sent|rcvd)-bytes` that tracks the total number
of snapshot bytes sent and received.

Additionally, this commit changes the `snapshotStrategy` interface
such that the `Receive` and `Send` methods take
`snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`

Resolves cockroachdb#81047
Release Note: None
KnightAsterial added a commit to KnightAsterial/cockroach that referenced this issue May 31, 2022
This commit adds the following new metrics
- `range.snapshots.unknown.rcvd-bytes`
- `range.snapshots.unknown.sent-bytes`
- `range.snapshots.rebalancing.rcvd-bytes`
- `range.snapshots.rebalancing.sent-bytes`
- `range.snapshots.recovery.rcvd-bytes`
- `range.snapshots.recovery.sent-bytes`
These metrics tracks the bytes send/received for each type of
snapshot (rebalance, recovery, and unknown snapshots).
The sum of these three new metrics should equal the existing
`range.snapshots.(sent|rcvd)-bytes` that tracks the total number
of snapshot bytes sent and received.

Additionally, this commit changes the `snapshotStrategy` interface
such that the `Receive` and `Send` methods take
`snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`

Finally, this commit adds a new SendSnapshot TestingKnob that hooks
into the send snapshot flow after a DelegateSnapshotRequest is
handled but before any throttling or sending logic takes place.

Resolves cockroachdb#81047
Release Note: None
KnightAsterial added a commit to KnightAsterial/cockroach that referenced this issue May 31, 2022
This commit adds the following new metrics
- `range.snapshots.unknown.rcvd-bytes`
- `range.snapshots.unknown.sent-bytes`
- `range.snapshots.rebalancing.rcvd-bytes`
- `range.snapshots.rebalancing.sent-bytes`
- `range.snapshots.recovery.rcvd-bytes`
- `range.snapshots.recovery.sent-bytes`
These metrics tracks the bytes send/received for each type of
snapshot (rebalance, recovery, and unknown snapshots).
The sum of these three new metrics should equal the existing
`range.snapshots.(sent|rcvd)-bytes` that tracks the total number
of snapshot bytes sent and received.

Additionally, this commit changes the `snapshotStrategy` interface
such that the `Receive` and `Send` methods take
`snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`

Finally, this commit adds a new SendSnapshot TestingKnob that hooks
into the send snapshot flow after a DelegateSnapshotRequest is
handled but before any throttling or sending logic takes place.

Resolves cockroachdb#81047
Release note (ui change): Added new 6 metrics
(range.snapshots.shapshots.(unknown|recovery|rebalancing).sent-bytes
and range.snapshots.shapshots.(unknown|recovery|rebalancing).rcvd-bytes)
to the metrics dashboard. This will allow users to track the number
of bytes sent/received for each type of metric in addition to the
total bytes sent/received.
KnightAsterial added a commit to KnightAsterial/cockroach that referenced this issue Jun 1, 2022
This commit adds the following new metrics
- `range.snapshots.unknown.rcvd-bytes`
- `range.snapshots.unknown.sent-bytes`
- `range.snapshots.rebalancing.rcvd-bytes`
- `range.snapshots.rebalancing.sent-bytes`
- `range.snapshots.recovery.rcvd-bytes`
- `range.snapshots.recovery.sent-bytes`
These metrics tracks the bytes send/received for each type of
snapshot (rebalance, recovery, and unknown snapshots).
The sum of these three new metrics should equal the existing
`range.snapshots.(sent|rcvd)-bytes` that tracks the total number
of snapshot bytes sent and received.

Additionally, this commit changes the `snapshotStrategy` interface
such that the `Receive` and `Send` methods take
`snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`.

Finally, this commit adds a new SendSnapshot TestingKnob that hooks
into the send snapshot flow after a DelegateSnapshotRequest is
handled but before any throttling or sending logic takes place.

Resolves cockroachdb#81047
Release note (ops change): Added new 6 metrics
(range.snapshots.shapshots.(unknown|recovery|rebalancing).sent-bytes
and range.snapshots.shapshots.(unknown|recovery|rebalancing).rcvd-bytes)
to the metrics dashboard. This will allow users to track the number
of bytes sent/received for each type of metric in addition to the
total bytes sent/received.
craig bot pushed a commit that referenced this issue Jun 1, 2022
79090: kv: alter error for SET TRANSACTION AS OF SYSTEM TIME r=e-mbrown a=e-mbrown

if reads or writes  are already performed

Resolves #77265 

When a txn performed a read or write before setting up a historical timestamp a crash report was generated. This commit handles the error case.

Release note: None

81860: kvserver: implement granular metrics for snapshot recovery and rebalance r=KnightAsterial a=KnightAsterial

This commit adds the following new metrics
- `range.snapshots.unknown.rcvd-bytes`
- `range.snapshots.unknown.sent-bytes`
- `range.snapshots.rebalancing.rcvd-bytes`
- `range.snapshots.rebalancing.sent-bytes`
- `range.snapshots.recovery.rcvd-bytes`
- `range.snapshots.recovery.sent-bytes`
These metrics tracks the bytes send/received for each type of
snapshot (rebalance, recovery, and unknown snapshots).
The sum of these three new metrics should equal the existing
`range.snapshots.(sent|rcvd)-bytes` that tracks the total number
of snapshot bytes sent and received.

Additionally, this commit changes the `snapshotStrategy` interface
such that the `Receive` and `Send` methods take
`snapshotBytesMetricFn` as a parameter rather than `*metric.Counter`.

Finally, this commit adds a new SendSnapshot TestingKnob that hooks
into the send snapshot flow after a DelegateSnapshotRequest is
handled but before any throttling or sending logic takes place.

Resolves #81047

Release note (ops change): Added new 6 metrics
(`range.snapshots.shapshots.(unknown|recovery|rebalancing).sent-bytes`
and `range.snapshots.shapshots.(unknown|recovery|rebalancing).rcvd-bytes`)
to the metrics dashboard. This will allow users to track the number
of bytes sent/received for each type of metric in addition to the
total bytes sent/received.

82277: authors: add surajr10 to authors r=surajr10 a=surajr10

Release note: None

82296: tests: update version check for tenant scoping r=dhartunian a=rimadeodhar

This PR fixes the version gate check to add tenant
scoping while creating client certs for tenant
roachtests. It is erroneously gated on v22.1 when it
should be v22.2.

Release note: None

Co-authored-by: e-mbrown <[email protected]>
Co-authored-by: Ryan Zhao <[email protected]>
Co-authored-by: Suraj <[email protected]>
Co-authored-by: rimadeodhar <[email protected]>
@craig craig bot closed this as completed in 3a3009c Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-starter Might be suitable for a starter project for new employees or team members. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants