storage: scatter right after split leads to poor balance #35907

tbg · 2019-03-18T19:52:15Z

Run this script:

#!/bin/bash

set -euxo pipefail

pkill -9 roach || true
rm -rf cockroach-data* || true

./cockroach start --insecure --listen-addr 127.0.0.1 --background
./cockroach start --insecure --http-port 8081 --port 26258 --store cockroach-data2 --join 127.0.0.1:26257 --background
./cockroach start --insecure --http-port 8082 --port 26259 --store cockroach-data3 --join 127.0.0.1:26257 --logtostderr --background

sleep 10
./cockroach sql --insecure -e "create table foo(id int primary key, v string);"
./cockroach sql --insecure -e "SET CLUSTER SETTING kv.range_merge.queue_enabled = false;"
./cockroach sql --insecure -e "ALTER TABLE foo SPLIT AT (SELECT i*10 FROM generate_series(1, 999) AS g(i));"

# sleep 70
./cockroach sql --insecure -e "ALTER TABLE foo SCATTER;"

See this kind of graph:

The expectation is that the SCATTER leaves the leaseholders roughly balanced. The graph shows a >10x difference.

We think that much of the variability in durations of bulk i/o restore/import is due to this phenomenon.

When I looked at this last, I think it was caused by the allocator receiving updated replica counts only at some interval, but I just inserted a 20s sleep before the scatter and it's just as bad. Ditto 70s:

Scatters seems to .... just not be doing the right thing. It seems to drain the local node, giving equal shares of the leases to the other followers.

cc @danhhz this is much worse than I thought 😆

Jira issue: CRDB-4542

The text was updated successfully, but these errors were encountered:

tbg · 2019-03-18T19:53:58Z

@darinpp this could be a baptism-by-fire debugging allocator/replication issue for you -- let's chat about it in 1:1

github-actions · 2021-06-04T19:53:18Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

knz · 2021-06-05T08:18:30Z

the issue still exists in 21.1

nvanbenschoten · 2022-03-10T19:36:52Z

I thought this would have been fixed by #75894, but it appears that the issue remains. cc. @aayushshah15.

aayushshah15 · 2022-03-15T18:20:49Z

I looked into this and what's happening here is that we're seeing correlated decisions being made across the different ranges that AdminScatter is called over. This is the reason we're only seeing this when we call AdminScatter when there already is an existing imbalance in lease counts among the nodes (all those ranges will individually try to reconcile this load imbalance, will makes the node with the most leases shed all of its leases away to the other nodes).

In DistSender.Send() we freely split batches containing these AdminScatterRequests across range boundaries and send those requests asynchronously. That combined with the fact that AdminScatter calls processOneChange directly (i.e. without any semaphore limiting these scatters) produces the symptoms we're seeing.

aayushshah15 · 2022-03-15T18:25:33Z

It seems bad that these scatters aren't rate limited somehow, so it seems like calling scatter on a large enough table can lead to kv.dist_sender.concurrency_limit number of snapshots flying around?

During the evaluation of AdminScatter, should we not be calling into replicateQueue.processOneChange() directly? or should we be making the DistSender not send these scatter requests asynchronously? Both these options seem like relatively significant behavioural changes at this point in this release.

github-actions · 2023-09-20T11:10:14Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label Mar 18, 2019

tbg self-assigned this Mar 18, 2019

tbg mentioned this issue Mar 18, 2019

roachtest: tpcc/nodes=3/w=max failed #35337

Closed

awoods187 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 1, 2019

tbg assigned danhhz and unassigned tbg Apr 17, 2019

github-actions bot added the no-issue-activity label Jun 4, 2021

knz removed the no-issue-activity label Jun 5, 2021

jlinder unassigned danhhz Jun 7, 2021

jlinder added the T-kv KV Team label Jun 16, 2021

erikgrinaker added A-kv-distribution Relating to rebalancing and leasing. and removed A-kv-replication Relating to Raft, consensus, and coordination. labels Oct 7, 2021

github-actions bot added the no-issue-activity label Sep 20, 2023

github-actions bot added the X-stale label Oct 2, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023

exalate-issue-sync bot closed this as completed Oct 2, 2023

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: scatter right after split leads to poor balance #35907

storage: scatter right after split leads to poor balance #35907

tbg commented Mar 18, 2019 •

edited by cockroach-jira-scripts

Loading

tbg commented Mar 18, 2019 •

edited

Loading

github-actions bot commented Jun 4, 2021

knz commented Jun 5, 2021

nvanbenschoten commented Mar 10, 2022

aayushshah15 commented Mar 15, 2022 •

edited

Loading

aayushshah15 commented Mar 15, 2022

github-actions bot commented Sep 20, 2023

storage: scatter right after split leads to poor balance #35907

storage: scatter right after split leads to poor balance #35907

Comments

tbg commented Mar 18, 2019 • edited by cockroach-jira-scripts Loading

tbg commented Mar 18, 2019 • edited Loading

github-actions bot commented Jun 4, 2021

knz commented Jun 5, 2021

nvanbenschoten commented Mar 10, 2022

aayushshah15 commented Mar 15, 2022 • edited Loading

aayushshah15 commented Mar 15, 2022

github-actions bot commented Sep 20, 2023

tbg commented Mar 18, 2019 •

edited by cockroach-jira-scripts

Loading

tbg commented Mar 18, 2019 •

edited

Loading

aayushshah15 commented Mar 15, 2022 •

edited

Loading