-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlccl: speed up partitioning tests #21392
Conversation
b0d9232
to
04ab32b
Compare
04ab32b
to
a74aff8
Compare
sqlDB := sqlutils.MakeSQLRunner(tc.Conns[0]) | ||
sqlDB.Exec(t, `CREATE DATABASE data`) | ||
sqlDB.Exec(t, `USE data`) | ||
|
||
// Disabling store throttling vastly speeds up rebalancing. | ||
sqlDB.Exec(t, `SET CLUSTER SETTING server.declined_reservation_timeout = '0s'`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. is it possible to just override directly via the *Settings in the testserverargs? IIRC these gossiped updates sometimes flake since you can't wait for them to propagate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding directly isn't a good idea as you can get that updated wiped out by Gossip updates. I think this is fine (or rather, I'll take my chances). Especially for this one it shouldn't cause a problem if it flakes? The test will take a little longer for the initial snapshots, but not for all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Partitioning tests are bottlenecked on test cluster rebalancing. Make rebalancing much faster by turning down the scan interval and disabling store throttling. TestInitialPartitioning and TestRepartitioning now take under 30s combined, when previously they could take several minutes. Release note: None
a74aff8
to
5c3d4ce
Compare
Looks like setting the ScanInterval to 1ms can starve out other stress instances. Sigh. Hopefully 100ms is a good middle ground. |
I looked into your acceptance flake. Looks like one node just isn't becoming live again after and in fact the cluster looks completely hosed (liveness problems, etc). Perhaps there is a race in which we restart nodes faster than their ports are propagated (so that eventually there's a disconnect)? I didn't think so, but something has to be causing this. |
(Either way, nothing that should block this PR) |
Oh, heh, I didn't even notice that acceptance test failure. A 100ms scan interval seems to have fixed the |
Thanks for the reviews, folks! |
Turning the ScanInterval way down shaves a minute or two off their
runtime.
Closes #21223.
Closes #21235.
Closes #21340.
Closes #21329.
Closes #21409.
Closes #21423.
Closes #21471.
Closes #21474.
Release note: None