Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: enable merge queue by default #28961

Merged
merged 7 commits into from
Aug 27, 2018

Conversation

benesch
Copy link
Contributor

@benesch benesch commented Aug 22, 2018

Flip the bit; burn the boats. (We're not really burning the boats; the last commit here won't get backported unless we're comfortable with merges.)

Release note (sql change): ALTER TABLE ... SPLIT AT now produces an
error if executed while the merge queue is enabled, as the merge queue
is likely to immediately discard any splits created by the statement.

Docs team: the above has implications for the documentation of ALTER TABLE ... SPLIT AT. Users are now prevented from splitting a table unless they disable the merge queue with SET CLUSTER SETTING kv.range_merge.queue_enabled = false. This is so that users are not confused when their manual splits are immediately merged away by the merge queue. (We're planning to come up with a more user friendly solution to this in v2.2.) They can turn the merge queue back on once they've ensured that their newly split ranges exceed the minimum range size (1MB), guaranteeing that they won't be merged away.

@benesch benesch requested review from tbg and a team August 22, 2018 15:38
@benesch benesch requested a review from a team as a code owner August 22, 2018 15:38
@benesch benesch requested review from a team August 22, 2018 15:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 58 of 58 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/split.go, line 44 at r4 (raw file):

	if storage.MergeQueueEnabled.Get(&p.ExecCfg().Settings.SV) {
		return nil, errors.New("splits would be immediately discarded by merge queue; " +
			"disable the merge queue first by running 'SET CLUSTER SETTING kv.range_merge.queue_enabled = false'")

Hrm, even if you run that, there's a gossip delay in there until it takes effect, which likely leads to flaky tests :-( If we see that happen, we're probably going to have to introduce a time-based mechanism as well.

Also, I think it would be better if you had an extra commit that introduces all of these guardrails and then a final commit that just changes a false to true, so that we can backport everything except that last change.

Copy link
Contributor Author

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/split.go, line 44 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Hrm, even if you run that, there's a gossip delay in there until it takes effect, which likely leads to flaky tests :-( If we see that happen, we're probably going to have to introduce a time-based mechanism as well.

Also, I think it would be better if you had an extra commit that introduces all of these guardrails and then a final commit that just changes a false to true, so that we can backport everything except that last change.

I've taken your suggestion of moving the actual bit flipping to the separate commit so we can backport everything else.

I also realized that we only need read-your-own writes for cluster settings within one SQL session. The node that executes SET CLUSTER SETTING kv.range_merge.queue_enabled = true is always the same node that then tries to run the ALTER TABLE ... SPLIT AT. It's very easy to make that happen. LMK what you think. FYI I considered a few other approaches (like querying the gossip status on every node; doing something specific to test servers) and this was by far the best IMO.

@benesch benesch force-pushed the merge-burn-boats branch 4 times, most recently from 95d8606 to 520bfeb Compare August 23, 2018 02:51
@benesch
Copy link
Contributor Author

benesch commented Aug 23, 2018

/cc @dt for the commit titled "settings,sql: make SET CLUSTER SETTING session-consistent"

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but please let someone double-check the SQL stuff.

Reviewed 60 of 60 files at r5, 2 of 2 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, 3 of 3 files at r9, 53 of 53 files at r10, 9 of 9 files at r11, 2 of 2 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

}

db0 := tc.ServerConn(0)
if _, err := db0.Exec(`SET CLUSTER SETTING sql.test_setting = 1`); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this put the SET inside an explicit txn and verify on sv1 that it does not appear until the commit?

This will still make for a potentially surprising UX when your SET doesn't show up inside your own txn though prior to commit. Ugh, maybe these should not be allowed in txns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed, these probably shouldn't be allowed in txns.

Actually, are you comfortable with this change going in without a test? This test sends up being flaky because multiple updates in quick succession will race with the gossip update and the test gets confused. But it really does completely eliminate the flakiness in every test that runs SET CLUSTER SETTING kv.range_merge.queue_enabled = false. Really trying to get this change in in time for next week's beta. 😬

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Been pondering this one for the last hour and I'm not crazy about it, particularly now there are two ways the in-mem value is managed instead of just being a cache of what's been gossiped. I'm trying to figure out what i'd like better though... playing with some ideas locally -- thinking some way that the SET could wait up to some timeout hoping to see its write come back via gossip and be processed. Thinking something like a channel pub/sub thing or updates handled, revision counter on each setting, or even just polling the setting value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

played with this a bit yesterday and came up with a slight variation -- wait for the new value to be returned when accessing the setting. What do you think about #29063?

@benesch benesch force-pushed the merge-burn-boats branch 2 times, most recently from e010b40 to a0f97fd Compare August 24, 2018 15:41
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Please however add a release note on the next-to-last commit to explain the interaction between SPLIT AT (a feature we've advertised already) and the new merge queue.

Also please add some guidance to the doc team in the top PR description to indicate what we should be telling users.

Also perhaps file the doc issue, and indicate in the doc issue that the text of the error message produced for SPLIT AT should be modified to link to the relevant doc page (when it exists).

Thanks

Reviewed 67 of 67 files at r13, 2 of 2 files at r14, 2 of 2 files at r15, 1 of 1 files at r16, 2 of 2 files at r17, 53 of 53 files at r18, 9 of 9 files at r19, 2 of 2 files at r20.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)

The retry loop in AdminSplit can span many seconds. In that time, the
replica may lose its lease, or the range might be merged away entirely.
In either of those cases, the split can never succeed, and so the retry
loop needs to give up.

The loop was properly exiting if it noticed it lost its lease, but a
range can get merged away without losing its lease. The final lease on
that range remains valid until the liveness epoch it is tied to expires.
Teach the loop to notice that condition too by checking
Replica.IsDestroyed on every turn of the loop.

Release note: None
Now that merges do not include a snapshot of the RHS data in the merge
trigger, we no longer need a setting limiting the size of the RHS of a
merge.

Release note: None
Merges are relatively expensive. Set the merge queue interval to one
second so we avoid processing too many merges at once. Introduce a
cluster setting to allow users/tests to adjust the merge queue interval
if they so choose.

Fix cockroachdb#27769.

Release note: None
Teach TestSystemZoneConfigs to install zone configs via SQL, rather than
the hacky testing override system, which interacts poorly with the
forthcoming on-by-default merge queue.

Release note: None
@benesch
Copy link
Contributor Author

benesch commented Aug 27, 2018

Rebased on top of #29063 to pick up @dt's change to make SET CLUSTER SETTING wait for the settings update to apply to the local gossip cache. Merging on green! 🎉

Turn off the merge queue in all tests that need it. The actual default
will be changed in a separate PR so that this commit can be safely
backported to release-2.1.

Release note: None
Splitting while the merge queue is enabled is almost certainly a user
mistake. Add a best-effort check to prevent users from splitting while
the merge queue is enabled. Users can override the check and request a
split anyway by twiddling a new session variable,
experimental_force_split_at.

We have plans to eventually make the splits created by SPLIT AT
"sticky", so that the merge queue does not immediately merge them away,
but not in time for 2.1.

Release note (sql change): ALTER TABLE ... SPLIT AT now produces an
error if executed while the merge queue is enabled, as the merge queue
is likely to immediately discard any splits created by the statement.
Start smoking out bugs in range merges by enabling them by default.
This commit will not be backported to v2.1 until we've gained enough
confidence in their stability on master.

Release note: None
@benesch
Copy link
Contributor Author

benesch commented Aug 27, 2018

bors r=tschottdorf,knz

craig bot pushed a commit that referenced this pull request Aug 27, 2018
28961:  storage: enable merge queue by default r=tschottdorf,knz a=benesch

Flip the bit; burn the boats. (We're not really burning the boats; the last commit here won't get backported unless we're comfortable with merges.)

Release note (sql change): ALTER TABLE ... SPLIT AT now produces an
error if executed while the merge queue is enabled, as the merge queue
is likely to immediately discard any splits created by the statement.

Docs team: the above has implications for the documentation of `ALTER TABLE ... SPLIT AT`. Users are now prevented from splitting a table unless they disable the merge queue with `SET CLUSTER SETTING kv.range_merge.queue_enabled = false`. This is so that users are not confused when their manual splits are immediately merged away by the merge queue. (We're planning to come up with a more user friendly solution to this in v2.2.) They can turn the merge queue back on once they've ensured that their newly split ranges exceed the minimum range size (1MB), guaranteeing that they won't be merged away.

Co-authored-by: Nikhil Benesch <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 27, 2018

Build succeeded

@craig craig bot merged commit 98ca1d0 into cockroachdb:master Aug 27, 2018
craig bot pushed a commit that referenced this pull request Aug 27, 2018
29033: backport-2.1: storage: improve consistency checker diff, add debug decode-key r=a-robinson a=tschottdorf

Backport 2/2 commits from #29012.

/cc @cockroachdb/release

---

Avoid spilling non-printable characters and generally garbage into the
logs.

Release note: None


29082: backport-2.1: one week of merge PRs r=knz,tschottdorf a=benesch

Backport:
  * 1/1 commits from "sql: put rendered setting value in event log entry" (#29014)
  * 2/2 commits from " sql,settings: make SET CLUSTER SETTING wait for value to update, disallow in txn" (#29063)
  * 1/1 commits from "storage: discard a unworthwhile merge TODO" (#28885)
  * 3/3 commits from "storage: gate merges behind a cluster setting " (#28865)
  * 1/1 commits from "storage: update the tscache appropriately after a merge" (#28793)
  * 1/1 commits from "storage: avoid stopping twice in merge test" (#28902)
  * 1/1 commits from "roachpb,storage: rename GetSnapshotForMerge to Subsume" (#28910)
  * 1/1 commits from "build: fix generation rules for settings.html" (#28884)
  * 1/1 commits from "storage: deflake TestStoreRangeMergeTimestampCacheCausality" (#28928)
  * 1/1 commits from "storage: check for in-progress merge before installing new lease" (#28894)
  * 6/7 commits from " storage: enable merge queue by default" (#28961)

Please see individual PRs for details.

/cc @cockroachdb/release


Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Nikhil Benesch <[email protected]>
@benesch benesch deleted the merge-burn-boats branch August 27, 2018 16:54
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 4, 2020
This allows us to remove a lot of complicated logic introduced
in cockroachdb#37506 and cockroachdb#28961 around conditionally setting split expirations
and disallowing manual splits when the merge queue is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants