Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: scatter right after split leads to poor balance #35907

Closed
tbg opened this issue Mar 18, 2019 · 7 comments
Closed

storage: scatter right after split leads to poor balance #35907

tbg opened this issue Mar 18, 2019 · 7 comments
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity T-kv KV Team X-stale

Comments

@tbg
Copy link
Member

tbg commented Mar 18, 2019

Run this script:

#!/bin/bash

set -euxo pipefail

pkill -9 roach || true
rm -rf cockroach-data* || true

./cockroach start --insecure --listen-addr 127.0.0.1 --background
./cockroach start --insecure --http-port 8081 --port 26258 --store cockroach-data2 --join 127.0.0.1:26257 --background
./cockroach start --insecure --http-port 8082 --port 26259 --store cockroach-data3 --join 127.0.0.1:26257 --logtostderr --background

sleep 10
./cockroach sql --insecure -e "create table foo(id int primary key, v string);"
./cockroach sql --insecure -e "SET CLUSTER SETTING kv.range_merge.queue_enabled = false;"
./cockroach sql --insecure -e "ALTER TABLE foo SPLIT AT (SELECT i*10 FROM generate_series(1, 999) AS g(i));"

# sleep 70
./cockroach sql --insecure -e "ALTER TABLE foo SCATTER;"

See this kind of graph:

image

The expectation is that the SCATTER leaves the leaseholders roughly balanced. The graph shows a >10x difference.

We think that much of the variability in durations of bulk i/o restore/import is due to this phenomenon.

When I looked at this last, I think it was caused by the allocator receiving updated replica counts only at some interval, but I just inserted a 20s sleep before the scatter and it's just as bad. Ditto 70s:

image

Scatters seems to .... just not be doing the right thing. It seems to drain the local node, giving equal shares of the leases to the other followers.

cc @danhhz this is much worse than I thought 😆

Jira issue: CRDB-4542

@tbg tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label Mar 18, 2019
@tbg tbg self-assigned this Mar 18, 2019
@tbg
Copy link
Member Author

tbg commented Mar 18, 2019

@darinpp this could be a baptism-by-fire debugging allocator/replication issue for you -- let's chat about it in 1:1

@awoods187 awoods187 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 1, 2019
@tbg tbg assigned danhhz and unassigned tbg Apr 17, 2019
@github-actions
Copy link

github-actions bot commented Jun 4, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Jun 5, 2021

the issue still exists in 21.1

image

@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@erikgrinaker erikgrinaker added A-kv-distribution Relating to rebalancing and leasing. and removed A-kv-replication Relating to Raft, consensus, and coordination. labels Oct 7, 2021
@nvanbenschoten
Copy link
Member

I thought this would have been fixed by #75894, but it appears that the issue remains. cc. @aayushshah15.

Screen Shot 2022-03-10 at 2 35 32 PM

@aayushshah15
Copy link
Contributor

aayushshah15 commented Mar 15, 2022

I looked into this and what's happening here is that we're seeing correlated decisions being made across the different ranges that AdminScatter is called over. This is the reason we're only seeing this when we call AdminScatter when there already is an existing imbalance in lease counts among the nodes (all those ranges will individually try to reconcile this load imbalance, will makes the node with the most leases shed all of its leases away to the other nodes).

In DistSender.Send() we freely split batches containing these AdminScatterRequests across range boundaries and send those requests asynchronously. That combined with the fact that AdminScatter calls processOneChange directly (i.e. without any semaphore limiting these scatters) produces the symptoms we're seeing.

@aayushshah15
Copy link
Contributor

It seems bad that these scatters aren't rate limited somehow, so it seems like calling scatter on a large enough table can lead to kv.dist_sender.concurrency_limit number of snapshots flying around?

During the evaluation of AdminScatter, should we not be calling into replicateQueue.processOneChange() directly? or should we be making the DistSender not send these scatter requests asynchronously? Both these options seem like relatively significant behavioural changes at this point in this release.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity T-kv KV Team X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

8 participants