-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: enable merge queue by default #28961
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 58 of 58 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/split.go, line 44 at r4 (raw file):
if storage.MergeQueueEnabled.Get(&p.ExecCfg().Settings.SV) { return nil, errors.New("splits would be immediately discarded by merge queue; " + "disable the merge queue first by running 'SET CLUSTER SETTING kv.range_merge.queue_enabled = false'")
Hrm, even if you run that, there's a gossip delay in there until it takes effect, which likely leads to flaky tests :-( If we see that happen, we're probably going to have to introduce a time-based mechanism as well.
Also, I think it would be better if you had an extra commit that introduces all of these guardrails and then a final commit that just changes a false
to true
, so that we can backport everything except that last change.
067910c
to
3281dc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/split.go, line 44 at r4 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Hrm, even if you run that, there's a gossip delay in there until it takes effect, which likely leads to flaky tests :-( If we see that happen, we're probably going to have to introduce a time-based mechanism as well.
Also, I think it would be better if you had an extra commit that introduces all of these guardrails and then a final commit that just changes a
false
totrue
, so that we can backport everything except that last change.
I've taken your suggestion of moving the actual bit flipping to the separate commit so we can backport everything else.
I also realized that we only need read-your-own writes for cluster settings within one SQL session. The node that executes SET CLUSTER SETTING kv.range_merge.queue_enabled = true
is always the same node that then tries to run the ALTER TABLE ... SPLIT AT
. It's very easy to make that happen. LMK what you think. FYI I considered a few other approaches (like querying the gossip status on every node; doing something specific to test servers) and this was by far the best IMO.
95d8606
to
520bfeb
Compare
/cc @dt for the commit titled "settings,sql: make SET CLUSTER SETTING session-consistent" |
520bfeb
to
3e7175c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but please let someone double-check the SQL stuff.
Reviewed 60 of 60 files at r5, 2 of 2 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 3 of 3 files at r9, 53 of 53 files at r10, 9 of 9 files at r11, 2 of 2 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/set_cluster_setting_test.go
Outdated
} | ||
|
||
db0 := tc.ServerConn(0) | ||
if _, err := db0.Exec(`SET CLUSTER SETTING sql.test_setting = 1`); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this put the SET
inside an explicit txn and verify on sv1 that it does not appear until the commit?
This will still make for a potentially surprising UX when your SET
doesn't show up inside your own txn though prior to commit. Ugh, maybe these should not be allowed in txns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed, these probably shouldn't be allowed in txns.
Actually, are you comfortable with this change going in without a test? This test sends up being flaky because multiple updates in quick succession will race with the gossip update and the test gets confused. But it really does completely eliminate the flakiness in every test that runs SET CLUSTER SETTING kv.range_merge.queue_enabled = false
. Really trying to get this change in in time for next week's beta. 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Been pondering this one for the last hour and I'm not crazy about it, particularly now there are two ways the in-mem value is managed instead of just being a cache of what's been gossiped. I'm trying to figure out what i'd like better though... playing with some ideas locally -- thinking some way that the SET
could wait up to some timeout hoping to see its write come back via gossip and be processed. Thinking something like a channel pub/sub thing or updates handled, revision counter on each setting, or even just polling the setting value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
played with this a bit yesterday and came up with a slight variation -- wait for the new value to be returned when accessing the setting. What do you think about #29063?
e010b40
to
a0f97fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please however add a release note on the next-to-last commit to explain the interaction between SPLIT AT (a feature we've advertised already) and the new merge queue.
Also please add some guidance to the doc team in the top PR description to indicate what we should be telling users.
Also perhaps file the doc issue, and indicate in the doc issue that the text of the error message produced for SPLIT AT should be modified to link to the relevant doc page (when it exists).
Thanks
Reviewed 67 of 67 files at r13, 2 of 2 files at r14, 2 of 2 files at r15, 1 of 1 files at r16, 2 of 2 files at r17, 53 of 53 files at r18, 9 of 9 files at r19, 2 of 2 files at r20.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
a0f97fd
to
31eb0b3
Compare
The retry loop in AdminSplit can span many seconds. In that time, the replica may lose its lease, or the range might be merged away entirely. In either of those cases, the split can never succeed, and so the retry loop needs to give up. The loop was properly exiting if it noticed it lost its lease, but a range can get merged away without losing its lease. The final lease on that range remains valid until the liveness epoch it is tied to expires. Teach the loop to notice that condition too by checking Replica.IsDestroyed on every turn of the loop. Release note: None
Now that merges do not include a snapshot of the RHS data in the merge trigger, we no longer need a setting limiting the size of the RHS of a merge. Release note: None
Merges are relatively expensive. Set the merge queue interval to one second so we avoid processing too many merges at once. Introduce a cluster setting to allow users/tests to adjust the merge queue interval if they so choose. Fix cockroachdb#27769. Release note: None
Teach TestSystemZoneConfigs to install zone configs via SQL, rather than the hacky testing override system, which interacts poorly with the forthcoming on-by-default merge queue. Release note: None
31eb0b3
to
bea6c5c
Compare
Turn off the merge queue in all tests that need it. The actual default will be changed in a separate PR so that this commit can be safely backported to release-2.1. Release note: None
Splitting while the merge queue is enabled is almost certainly a user mistake. Add a best-effort check to prevent users from splitting while the merge queue is enabled. Users can override the check and request a split anyway by twiddling a new session variable, experimental_force_split_at. We have plans to eventually make the splits created by SPLIT AT "sticky", so that the merge queue does not immediately merge them away, but not in time for 2.1. Release note (sql change): ALTER TABLE ... SPLIT AT now produces an error if executed while the merge queue is enabled, as the merge queue is likely to immediately discard any splits created by the statement.
Start smoking out bugs in range merges by enabling them by default. This commit will not be backported to v2.1 until we've gained enough confidence in their stability on master. Release note: None
bea6c5c
to
98ca1d0
Compare
bors r=tschottdorf,knz |
28961: storage: enable merge queue by default r=tschottdorf,knz a=benesch Flip the bit; burn the boats. (We're not really burning the boats; the last commit here won't get backported unless we're comfortable with merges.) Release note (sql change): ALTER TABLE ... SPLIT AT now produces an error if executed while the merge queue is enabled, as the merge queue is likely to immediately discard any splits created by the statement. Docs team: the above has implications for the documentation of `ALTER TABLE ... SPLIT AT`. Users are now prevented from splitting a table unless they disable the merge queue with `SET CLUSTER SETTING kv.range_merge.queue_enabled = false`. This is so that users are not confused when their manual splits are immediately merged away by the merge queue. (We're planning to come up with a more user friendly solution to this in v2.2.) They can turn the merge queue back on once they've ensured that their newly split ranges exceed the minimum range size (1MB), guaranteeing that they won't be merged away. Co-authored-by: Nikhil Benesch <[email protected]>
Build succeeded |
29033: backport-2.1: storage: improve consistency checker diff, add debug decode-key r=a-robinson a=tschottdorf Backport 2/2 commits from #29012. /cc @cockroachdb/release --- Avoid spilling non-printable characters and generally garbage into the logs. Release note: None 29082: backport-2.1: one week of merge PRs r=knz,tschottdorf a=benesch Backport: * 1/1 commits from "sql: put rendered setting value in event log entry" (#29014) * 2/2 commits from " sql,settings: make SET CLUSTER SETTING wait for value to update, disallow in txn" (#29063) * 1/1 commits from "storage: discard a unworthwhile merge TODO" (#28885) * 3/3 commits from "storage: gate merges behind a cluster setting " (#28865) * 1/1 commits from "storage: update the tscache appropriately after a merge" (#28793) * 1/1 commits from "storage: avoid stopping twice in merge test" (#28902) * 1/1 commits from "roachpb,storage: rename GetSnapshotForMerge to Subsume" (#28910) * 1/1 commits from "build: fix generation rules for settings.html" (#28884) * 1/1 commits from "storage: deflake TestStoreRangeMergeTimestampCacheCausality" (#28928) * 1/1 commits from "storage: check for in-progress merge before installing new lease" (#28894) * 6/7 commits from " storage: enable merge queue by default" (#28961) Please see individual PRs for details. /cc @cockroachdb/release Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Nikhil Benesch <[email protected]>
This allows us to remove a lot of complicated logic introduced in cockroachdb#37506 and cockroachdb#28961 around conditionally setting split expirations and disallowing manual splits when the merge queue is enabled.
Flip the bit; burn the boats. (We're not really burning the boats; the last commit here won't get backported unless we're comfortable with merges.)
Release note (sql change): ALTER TABLE ... SPLIT AT now produces an
error if executed while the merge queue is enabled, as the merge queue
is likely to immediately discard any splits created by the statement.
Docs team: the above has implications for the documentation of
ALTER TABLE ... SPLIT AT
. Users are now prevented from splitting a table unless they disable the merge queue withSET CLUSTER SETTING kv.range_merge.queue_enabled = false
. This is so that users are not confused when their manual splits are immediately merged away by the merge queue. (We're planning to come up with a more user friendly solution to this in v2.2.) They can turn the merge queue back on once they've ensured that their newly split ranges exceed the minimum range size (1MB), guaranteeing that they won't be merged away.