storage: enable merge queue by default #28961

benesch · 2018-08-22T15:38:58Z

Flip the bit; burn the boats. (We're not really burning the boats; the last commit here won't get backported unless we're comfortable with merges.)

Release note (sql change): ALTER TABLE ... SPLIT AT now produces an
error if executed while the merge queue is enabled, as the merge queue
is likely to immediately discard any splits created by the statement.

Docs team: the above has implications for the documentation of ALTER TABLE ... SPLIT AT. Users are now prevented from splitting a table unless they disable the merge queue with SET CLUSTER SETTING kv.range_merge.queue_enabled = false. This is so that users are not confused when their manual splits are immediately merged away by the merge queue. (We're planning to come up with a more user friendly solution to this in v2.2.) They can turn the merge queue back on once they've ensured that their newly split ranges exceed the minimum range size (1MB), guaranteeing that they won't be merged away.

cockroach-teamcity · 2018-08-22T15:39:08Z

This change is

tbg

Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 58 of 58 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/split.go, line 44 at r4 (raw file):

	if storage.MergeQueueEnabled.Get(&p.ExecCfg().Settings.SV) {
		return nil, errors.New("splits would be immediately discarded by merge queue; " +
			"disable the merge queue first by running 'SET CLUSTER SETTING kv.range_merge.queue_enabled = false'")

Hrm, even if you run that, there's a gossip delay in there until it takes effect, which likely leads to flaky tests :-( If we see that happen, we're probably going to have to introduce a time-based mechanism as well.

Also, I think it would be better if you had an extra commit that introduces all of these guardrails and then a final commit that just changes a false to true, so that we can backport everything except that last change.

benesch

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/split.go, line 44 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Hrm, even if you run that, there's a gossip delay in there until it takes effect, which likely leads to flaky tests :-( If we see that happen, we're probably going to have to introduce a time-based mechanism as well.

Also, I think it would be better if you had an extra commit that introduces all of these guardrails and then a final commit that just changes a false to true, so that we can backport everything except that last change.

I've taken your suggestion of moving the actual bit flipping to the separate commit so we can backport everything else.

I also realized that we only need read-your-own writes for cluster settings within one SQL session. The node that executes SET CLUSTER SETTING kv.range_merge.queue_enabled = true is always the same node that then tries to run the ALTER TABLE ... SPLIT AT. It's very easy to make that happen. LMK what you think. FYI I considered a few other approaches (like querying the gossip status on every node; doing something specific to test servers) and this was by far the best IMO.

benesch · 2018-08-23T02:53:25Z

/cc @dt for the commit titled "settings,sql: make SET CLUSTER SETTING session-consistent"

tbg

but please let someone double-check the SQL stuff.

Reviewed 60 of 60 files at r5, 2 of 2 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 3 of 3 files at r9, 53 of 53 files at r10, 9 of 9 files at r11, 2 of 2 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

dt · 2018-08-23T16:38:04Z

pkg/sql/set_cluster_setting_test.go

+	}
+
+	db0 := tc.ServerConn(0)
+	if _, err := db0.Exec(`SET CLUSTER SETTING sql.test_setting = 1`); err != nil {


Should this put the SET inside an explicit txn and verify on sv1 that it does not appear until the commit?

This will still make for a potentially surprising UX when your SET doesn't show up inside your own txn though prior to commit. Ugh, maybe these should not be allowed in txns.

Yeah, agreed, these probably shouldn't be allowed in txns.

Actually, are you comfortable with this change going in without a test? This test sends up being flaky because multiple updates in quick succession will race with the gossip update and the test gets confused. But it really does completely eliminate the flakiness in every test that runs SET CLUSTER SETTING kv.range_merge.queue_enabled = false. Really trying to get this change in in time for next week's beta. 😬

Hm. Been pondering this one for the last hour and I'm not crazy about it, particularly now there are two ways the in-mem value is managed instead of just being a cache of what's been gossiped. I'm trying to figure out what i'd like better though... playing with some ideas locally -- thinking some way that the SET could wait up to some timeout hoping to see its write come back via gossip and be processed. Thinking something like a channel pub/sub thing or updates handled, revision counter on each setting, or even just polling the setting value.

played with this a bit yesterday and came up with a slight variation -- wait for the new value to be returned when accessing the setting. What do you think about #29063?

knz

Please however add a release note on the next-to-last commit to explain the interaction between SPLIT AT (a feature we've advertised already) and the new merge queue.

Also please add some guidance to the doc team in the top PR description to indicate what we should be telling users.

Also perhaps file the doc issue, and indicate in the doc issue that the text of the error message produced for SPLIT AT should be modified to link to the relevant doc page (when it exists).

Thanks

Reviewed 67 of 67 files at r13, 2 of 2 files at r14, 2 of 2 files at r15, 1 of 1 files at r16, 2 of 2 files at r17, 53 of 53 files at r18, 9 of 9 files at r19, 2 of 2 files at r20.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)

The retry loop in AdminSplit can span many seconds. In that time, the replica may lose its lease, or the range might be merged away entirely. In either of those cases, the split can never succeed, and so the retry loop needs to give up. The loop was properly exiting if it noticed it lost its lease, but a range can get merged away without losing its lease. The final lease on that range remains valid until the liveness epoch it is tied to expires. Teach the loop to notice that condition too by checking Replica.IsDestroyed on every turn of the loop. Release note: None

Now that merges do not include a snapshot of the RHS data in the merge trigger, we no longer need a setting limiting the size of the RHS of a merge. Release note: None

Merges are relatively expensive. Set the merge queue interval to one second so we avoid processing too many merges at once. Introduce a cluster setting to allow users/tests to adjust the merge queue interval if they so choose. Fix cockroachdb#27769. Release note: None

Teach TestSystemZoneConfigs to install zone configs via SQL, rather than the hacky testing override system, which interacts poorly with the forthcoming on-by-default merge queue. Release note: None

benesch · 2018-08-27T01:19:15Z

Rebased on top of #29063 to pick up @dt's change to make SET CLUSTER SETTING wait for the settings update to apply to the local gossip cache. Merging on green! 🎉

Turn off the merge queue in all tests that need it. The actual default will be changed in a separate PR so that this commit can be safely backported to release-2.1. Release note: None

Splitting while the merge queue is enabled is almost certainly a user mistake. Add a best-effort check to prevent users from splitting while the merge queue is enabled. Users can override the check and request a split anyway by twiddling a new session variable, experimental_force_split_at. We have plans to eventually make the splits created by SPLIT AT "sticky", so that the merge queue does not immediately merge them away, but not in time for 2.1. Release note (sql change): ALTER TABLE ... SPLIT AT now produces an error if executed while the merge queue is enabled, as the merge queue is likely to immediately discard any splits created by the statement.

Start smoking out bugs in range merges by enabling them by default. This commit will not be backported to v2.1 until we've gained enough confidence in their stability on master. Release note: None

benesch · 2018-08-27T02:44:51Z

bors r=tschottdorf,knz

28961: storage: enable merge queue by default r=tschottdorf,knz a=benesch Flip the bit; burn the boats. (We're not really burning the boats; the last commit here won't get backported unless we're comfortable with merges.) Release note (sql change): ALTER TABLE ... SPLIT AT now produces an error if executed while the merge queue is enabled, as the merge queue is likely to immediately discard any splits created by the statement. Docs team: the above has implications for the documentation of `ALTER TABLE ... SPLIT AT`. Users are now prevented from splitting a table unless they disable the merge queue with `SET CLUSTER SETTING kv.range_merge.queue_enabled = false`. This is so that users are not confused when their manual splits are immediately merged away by the merge queue. (We're planning to come up with a more user friendly solution to this in v2.2.) They can turn the merge queue back on once they've ensured that their newly split ranges exceed the minimum range size (1MB), guaranteeing that they won't be merged away. Co-authored-by: Nikhil Benesch <[email protected]>

craig · 2018-08-27T03:06:21Z

Build succeeded

GitHub CI (Cockroach)

29033: backport-2.1: storage: improve consistency checker diff, add debug decode-key r=a-robinson a=tschottdorf Backport 2/2 commits from #29012. /cc @cockroachdb/release --- Avoid spilling non-printable characters and generally garbage into the logs. Release note: None 29082: backport-2.1: one week of merge PRs r=knz,tschottdorf a=benesch Backport: * 1/1 commits from "sql: put rendered setting value in event log entry" (#29014) * 2/2 commits from " sql,settings: make SET CLUSTER SETTING wait for value to update, disallow in txn" (#29063) * 1/1 commits from "storage: discard a unworthwhile merge TODO" (#28885) * 3/3 commits from "storage: gate merges behind a cluster setting " (#28865) * 1/1 commits from "storage: update the tscache appropriately after a merge" (#28793) * 1/1 commits from "storage: avoid stopping twice in merge test" (#28902) * 1/1 commits from "roachpb,storage: rename GetSnapshotForMerge to Subsume" (#28910) * 1/1 commits from "build: fix generation rules for settings.html" (#28884) * 1/1 commits from "storage: deflake TestStoreRangeMergeTimestampCacheCausality" (#28928) * 1/1 commits from "storage: check for in-progress merge before installing new lease" (#28894) * 6/7 commits from " storage: enable merge queue by default" (#28961) Please see individual PRs for details. /cc @cockroachdb/release Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Nikhil Benesch <[email protected]>

This allows us to remove a lot of complicated logic introduced in cockroachdb#37506 and cockroachdb#28961 around conditionally setting split expirations and disallowing manual splits when the merge queue is enabled.

benesch requested review from tbg and a team August 22, 2018 15:38

benesch requested a review from a team as a code owner August 22, 2018 15:38

benesch requested review from a team August 22, 2018 15:38

tbg approved these changes Aug 22, 2018

View reviewed changes

benesch force-pushed the merge-burn-boats branch from 067910c to 3281dc5 Compare August 22, 2018 23:58

benesch commented Aug 23, 2018

View reviewed changes

benesch force-pushed the merge-burn-boats branch 4 times, most recently from 95d8606 to 520bfeb Compare August 23, 2018 02:51

benesch force-pushed the merge-burn-boats branch from 520bfeb to 3e7175c Compare August 23, 2018 03:41

benesch mentioned this pull request Aug 23, 2018

backport-2.1: one week of merge PRs #28994

Closed

tbg approved these changes Aug 23, 2018

View reviewed changes

dt reviewed Aug 23, 2018

View reviewed changes

benesch force-pushed the merge-burn-boats branch 2 times, most recently from e010b40 to a0f97fd Compare August 24, 2018 15:41

knz approved these changes Aug 25, 2018

View reviewed changes

benesch force-pushed the merge-burn-boats branch from a0f97fd to 31eb0b3 Compare August 27, 2018 01:16

benesch added 4 commits August 26, 2018 21:17

storage: remove MergeMaxRHSSize setting

ccc29bb

Now that merges do not include a snapshot of the RHS data in the merge trigger, we no longer need a setting limiting the size of the RHS of a merge. Release note: None

storage: update zone config installation in TestSystemZoneConfigs

4a15720

Teach TestSystemZoneConfigs to install zone configs via SQL, rather than the hacky testing override system, which interacts poorly with the forthcoming on-by-default merge queue. Release note: None

benesch force-pushed the merge-burn-boats branch from 31eb0b3 to bea6c5c Compare August 27, 2018 01:18

benesch mentioned this pull request Aug 27, 2018

ALTER TABLE ... SPLIT AT now requires merge queue to be disabled cockroachdb/docs#3586

Closed

benesch added 3 commits August 26, 2018 21:52

storage: prepare to enable merge queue by default

2e602f5

Turn off the merge queue in all tests that need it. The actual default will be changed in a separate PR so that this commit can be safely backported to release-2.1. Release note: None

storage: enable the merge queue by default

98ca1d0

Start smoking out bugs in range merges by enabling them by default. This commit will not be backported to v2.1 until we've gained enough confidence in their stability on master. Release note: None

benesch force-pushed the merge-burn-boats branch from bea6c5c to 98ca1d0 Compare August 27, 2018 01:52

craig bot merged commit 98ca1d0 into cockroachdb:master Aug 27, 2018

benesch mentioned this pull request Aug 27, 2018

backport-2.1: one week of merge PRs #29082

Merged

benesch deleted the merge-burn-boats branch August 27, 2018 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: enable merge queue by default #28961

storage: enable merge queue by default #28961

benesch commented Aug 22, 2018 •

edited

Loading

cockroach-teamcity commented Aug 22, 2018

tbg left a comment

benesch left a comment

benesch commented Aug 23, 2018

tbg left a comment

dt Aug 23, 2018

benesch Aug 24, 2018

dt Aug 24, 2018

dt Aug 25, 2018

knz left a comment

benesch commented Aug 27, 2018

benesch commented Aug 27, 2018

craig bot commented Aug 27, 2018

storage: enable merge queue by default #28961

storage: enable merge queue by default #28961

Conversation

benesch commented Aug 22, 2018 • edited Loading

cockroach-teamcity commented Aug 22, 2018

tbg left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

benesch commented Aug 23, 2018

tbg left a comment

Choose a reason for hiding this comment

dt Aug 23, 2018

Choose a reason for hiding this comment

benesch Aug 24, 2018

Choose a reason for hiding this comment

dt Aug 24, 2018

Choose a reason for hiding this comment

dt Aug 25, 2018

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

benesch commented Aug 27, 2018

benesch commented Aug 27, 2018

craig bot commented Aug 27, 2018

Build succeeded

benesch commented Aug 22, 2018 •

edited

Loading