Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: pre-split hash sharded indexes on shard boundaries before DELETE_AND_WRITE_ONLY #74558

Closed
ajwerner opened this issue Jan 6, 2022 · 0 comments · Fixed by #74923
Closed
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Jan 6, 2022

Is your feature request related to a problem? Please describe.

We're creating a hash sharded index because we know that the write volume will be sequential. As we've see in #62672 (comment) and elsewhere, throughput can crater when we make these new indexes writable because all of the load lands on a single range. Ideally we'd pre-split the new index before turning on traffic.

It's a somewhat tricky problem to figure out, in the general case, where to pre-split these new indexes. Fortunately, for hash sharded indexes, we get to make an assumption that underneath each shard, the data is uniformly distributed. That means we can safely split at each shard boundary and know that it's a smart place to put a split.

Describe the solution you'd like

Before making a hash sharded index DELETE_AND_WRITE_ONLY, split it with an expiring split point at each shard boundary.

Describe alternatives you've considered
There was tons of discussion of the broader problem in this internal thread.

Additional context

We should also split other secondary indexes at the same time, but the heuristic on how many split and where to put them is less obvious. We'll track that separately.

This change should be backportable.

Epic: CRDB-7363

gz#10941

@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 6, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jan 6, 2022
craig bot pushed a commit that referenced this issue Jan 24, 2022
74810: kvserver: remove unnecessary special casing of lease for LAI r=nvanbenschoten a=tbg

Touches #33007 (this simplifies #71806).

We were previously "not" assigning a LeaseAppliedIndex to lease
request proposals. But we were doing so very half-heartedly: instead of
always assigning zero, we assigned "whatever was assigned to the
previous command". This meant that in practice, the first lease on a
range would get zero, and any subsequent leases would get some nonzero
numbers. This wasn't particularly principled and raises eyebrows. For
testing convenience and clarity it is helpful to assign a zero MLAI
to leases, which is what this commit does.

We then turn to the special-casing in refreshProposalsLocked.

`(*Replica).refreshProposalsLocked` previously refused to repropose
commands that had a zero `AppliedLeaseIndex` (LAI). This was done on
grounds of needing to provide replay protection for these requests.
Upon inspection it became clear that a zero MLAI could only ever apply
to lease requests; they are the only proposals with an exception, and as
we saw above it would not usually result in a zero, but could (and
especially in testing, where most leases are the first lease on their
respective range).

We [discussed] this internally and concluded that leases can in fact be
reproposed, since they have their own replay protection mediated through
the lease's sequence number. This is great since as stated above, we did
in fact repropose (some) leases and have for years.

This commit removes the special casing.

Fixes #74711.

As suggested by @nvanbenschoten I ran the `kvserver` tests with this
diff that reproposes every lease request:

```diff
diff --git a/pkg/kv/kvserver/replica_proposal_buf.go b/pkg/kv/kvserver/replica_proposal_buf.go
index 95765d8efd..1b66745866 100644
--- a/pkg/kv/kvserver/replica_proposal_buf.go
+++ b/pkg/kv/kvserver/replica_proposal_buf.go
@@ -576,6 +576,11 @@ func (b *propBuf) FlushLockedWithRaftGroup(
 			ents = append(ents, raftpb.Entry{
 				Data: p.encodedCommand,
 			})
+			if p.Request.IsLeaseRequest() {
+				ents = append(ents, raftpb.Entry{
+					Data: p.encodedCommand,
+				})
+			}
 		}
 	}
 	if firstErr != nil {

```

This inspired the subsequent commits.

[discussed]: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1641559185021100

Release note: None

74923: sql: pre-split hash sharded indexes ranges before DELETE_AND_WRITE_ONLY r=chengxiong-ruan a=chengxiong-ruan

Fixes #74558

Pre-split ranges on shard boundaries before SchemaChanger move
new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY
state.

Release note (bug fix): None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
@craig craig bot closed this as completed in 1723f7a Jan 24, 2022
chengxiong-ruan added a commit to chengxiong-ruan/cockroach that referenced this issue Jan 26, 2022
Fixes cockroachdb#74558

Pre-split ranges on shard boundaries before SchemaChanger move
new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY
state.

Release note (bug fix): When creating hash sharded index to an existing
table, traffic could hit hard on the single range of the index before
it is split into more ranges for shards as range size grows. This change
make schema changer able to presplit ranges on shard boundaries before
the index becomes writable. `sql.hash_sharded_range_pre_split.max` is
the cluster setting added so that users can set the upbound of the
amount of ranges to have. If the bucket count of the defined index is
less than the cluster setting, the bucket count will be the amount of
pre-split ranges.
chengxiong-ruan added a commit to chengxiong-ruan/cockroach that referenced this issue Jan 26, 2022
Fixes cockroachdb#74558

Pre-split ranges on shard boundaries before SchemaChanger move
new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY
state.

Release note (bug fix): When creating hash sharded index to an existing
table, traffic could hit hard on the single range of the index before
it is split into more ranges for shards as range size grows. This change
make schema changer able to presplit ranges on shard boundaries before
the index becomes writable. `sql.hash_sharded_range_pre_split.max` is
the cluster setting added so that users can set the upbound of the
amount of ranges to have. If the bucket count of the defined index is
less than the cluster setting, the bucket count will be the amount of
pre-split ranges.
chengxiong-ruan added a commit to chengxiong-ruan/cockroach that referenced this issue Jan 26, 2022
Fixes cockroachdb#74558

Pre-split ranges on shard boundaries before SchemaChanger move
new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY
state.

Release note (bug fix): When creating hash sharded index to an existing
table, traffic could hit hard on the single range of the index before
it is split into more ranges for shards as range size grows. This change
make schema changer able to presplit ranges on shard boundaries before
the index becomes writable. `sql.hash_sharded_range_pre_split.max` is
the cluster setting added so that users can set the upbound of the
amount of ranges to have. If the bucket count of the defined index is
less than the cluster setting, the bucket count will be the amount of
pre-split ranges.
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants