Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: export remaining snapshot bytes #85528

Closed
kvoli opened this issue Aug 3, 2022 · 1 comment
Closed

kvserver: export remaining snapshot bytes #85528

kvoli opened this issue Aug 3, 2022 · 1 comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Aug 3, 2022

Summary
Each snapshot may be a different size, it would be beneficial to track the total remaining snapshot bytes that are queued and in progress on a store's receiver snapshot semaphore. Additionally the remaining bytes that are queued on a store's sender snapshot semaphore.

Note we currently track the current reservations in bytes, which is the current size of the snapshot(s) being processed on a store capacity.reserved.

Solution

The solution is to add four additional exported metrics, with the last two optional and pending how useful they are:

  1. range.snapshots.queued-rcvd: a gauge tracking the sum of all snapshot bytes that are currently queued on a store's receive queue, however have not gotten a reservation (begun processing).
  2. range.snapshots.queued-send: a gauge tracking the sum of all snapshot bytes that are currently queued on a store's send queue, however have not begun gotten a reservation (begun processing).
  3. range.snapshots.pending-rcvd: a gauge tracking the sum of all snapshot bytes that remain on a store's receiving side, for snapshots that have acquired a reservation. This could be updated more frequently, to track the "remaining bytes" i.e. reservation - processed.
  4. range.snapshots.pending-send: a gauge tracking the sum of all snapshot bytes that remain on a store's sending side, for snapshots that have acquired a reservation. Similar to above, this is tracking the remaining bytes to be sent.

Context

(3) and (4) may not present much material benefit, as snapshots should in most cases be processed in under 16 (512mb/32mb/s) seconds. Whilst the default metric update interval is 10 seconds, In cases where the snapshot rate is set lower, it may provide utility - however the existing capacity.reserved metric, tracking the total (unprocessed + processed) in progress snapshot bytes may be more appropriate. This issue leaves them as optional.

related PR, for count rather than bytes: #84947

cc @AlexTalks

Jira issue: CRDB-18293

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Aug 3, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Aug 3, 2022
miraradeva added a commit to miraradeva/cockroach that referenced this issue Apr 7, 2023
Previously, we had metrics to track the number of snapshots waiting in
the snapshot queue; however, snapshots may be of different sizes, so it
is also helpful to track the size of all snapshots in the queue. This
change adds the following metrics to track the total size of all
snapshots waiting in the queue:

    range.snapshots.send-queue-bytes
    range.snapshots.recv-queue-bytes

Informs: cockroachdb#85528
Release note (ops change): Added two new metrics,
range.snapshots.(send|recv)-queue-bytes, to track the total size of all
snapshots waiting in the snapshot queue.
miraradeva added a commit to miraradeva/cockroach that referenced this issue Apr 11, 2023
Previously, we had metrics to track the number of snapshots waiting in
the snapshot queue; however, snapshots may be of different sizes, so it
is also helpful to track the size of all snapshots in the queue. This
change adds the following metrics to track the total size of all
snapshots waiting in the queue:

    range.snapshots.send-queue-bytes
    range.snapshots.recv-queue-bytes

Informs: cockroachdb#85528
Release note (ops change): Added two new metrics,
range.snapshots.(send|recv)-queue-bytes, to track the total size of all
snapshots waiting in the snapshot queue.
craig bot pushed a commit that referenced this issue Apr 11, 2023
99275: sql: enabling forward indexes and ORDERBY on JSONB columns r=celiala a=Shivs11

Currently, #97928 outlines the scheme for JSONB encoding
and decoding for forward indexes. However, the PR doesn't
enable this feature to our users. This current PR aims
to allow forward indexes on JSONB columns. The presence
of a lexicographical ordering, as described in #97928,
shall now allow primary and secondary indexes on JSONB
columns along with the ability to use `ORDER BY` filter
in their queries.

Additionally, JSON values consist of decimal numbers
and containers, such as Arrays and Objects, which can
contain these decimal numbers. In order to preserve
the values after the decimal, JSONB columns are now
required to be composite in nature. This shall enable
such values to be stored in both the key and the value
side of a K/V pair in hopes of receiving the exact value.

Fixes: #35706

Release note (sql change): This PR adds support for enabling
forward indexes and ordering on JSON values.

Epic: [CRDB-24501](https://cockroachlabs.atlassian.net/browse/CRDB-24501)

100942: kvserver: add metrics to track snapshot queue size r=kvoli a=miraradeva

Previously, we had metrics to track the number of snapshots waiting in
the snapshot queue; however, snapshots may be of different sizes, so it
is also helpful to track the size of all snapshots in the queue. This
change adds the following metrics to track the total size of all
snapshots waiting in the queue:

    range.snapshots.send-queue-bytes
    range.snapshots.recv-queue-bytes

Informs: #85528
Release note (ops change): Added two new metrics,
range.snapshots.(send|recv)-queue-bytes, to track the total size of all
snapshots waiting in the snapshot queue.

101220: roachtest: prevent shared mutable state across c2c roachtest runs r=benbardin a=msbutler

Previously, all `c2c/*` roachtests run with `--count` would provide incomprehensible results because multiple roachtest runs of the same test would override each other's state. Specifically, the latest call of `test_spec.Run()`, would override the `test.Test` harness, and `syncedCluster.Cluster` used by all other tests with the same registration.

This patch fixes this problem by moving all fields in `replicationSpec` that are set during test execution (i.e. a `test_spec.Run` call), to a new `replicationDriver` struct. Now, `replicationSpec` gets defined during test registration and is shared across test runs, while `replicationDriver` gets set within a test run.

Epic: None
Release note: None

Co-authored-by: Shivam Saraf <[email protected]>
Co-authored-by: Mira Radeva <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
@miraradeva
Copy link
Contributor

Added metrics (1) and (2) above as part of #100942. Austen helped me test the change using this roachprod script:

export cluster=$USER-snapqueue
# create the cluster and stage the binary (you will need to build the linux binary)
# use n1-4 as the initial cluster nodes. n5 to join later once there's data written and n6 as the workload runner
roachprod create $cluster -n 6 --gce-machine-type=n1-standard-8 
roachprod put $cluster cockroach cockroach # (assumes linux binary is named cockroach)
roachprod start $cluster:1-4  # start only the first 4 nodes
# load initial data size of 15gb (unreplicated), then run a 50% read workload at 300 target ops/s writing 16kb each op for 10 minutes
roachprod run $cluster:6 -- './cockroach workload run kv --drop --read-percent=50 --min-block-bytes=16000 --max-block-bytes=16000 --insert-count=1000000 --max-rate=300 --concurrency=128 --duration=10m {pgurl:1-4}'
roachprod start $cluster:5
# now watch the snapshot bytes queued metric, can also start the workload runner again and see what happens.

Screenshot 2023-04-10 at 11 55 59 AM

For metrics (3) and (4), I added another metric to keep track of the total reserved bytes sent/received of snapshots with reservations (range.snapshots.recv-reserved-bytes). We also have an existing metrics for total bytes sent/received (range.snapshots.rcvd-bytes). And the metrics we want for (3) and (4) are essentially the difference between these two. I ran the same workload and watched node 5 ramp up in terms of the two metrics above (I didn't do the actual difference because I haven't figured out grafana yet).

Screenshot 2023-04-11 at 4 14 39 PM

The lines seem very close, so the difference will likely be too small to tell us much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants