kv: load-based splitter permits splits between SQL rows, given non-SQL row inputs #103483

nvanbenschoten · 2023-05-16T22:47:04Z

An important constraint maintained (loosely) by kv is that it will not split between the column families of a single row. This ensures that a single row is never torn across ranges. This constraint is maintained during both size-based split and load-based splits. For size-based splits, split points are derived from the keys in the range. For load-based splits, split points are derived from the requests to the range. This is a subtle distinction, but it means that the load-based splitter has a harder job to do, because while there is a general guarantee that keys in a range will be valid, there is no such guarantee that requests to the range will target valid keys.

We added split key sanitization (a call to keys.EnsureSafeSplitKey) to the load-based splitter in fad2024. Shortly after, we added logic to ignore errors from such sanitation in 1d5eb7c, which was important to allow load-based splits in non-SQL keyspaces ¹.

We've now seen that ignoring errors from key sanitation can allow for invalid key access to keys between column families in a single row to create load-based split points. This can lead to torn rows and panics in SQL queries that access these rows. For example, consider a scan over a table with two column families on each key:

1. begin scan from "a" to "z"
2. scan hits key limit at "g/0", return resume span of "g/0".Next()
3. scan from "g/0".Next() to "z"
4. "g/0".Next() gets into the load-based splitter
5. "g/0".Next() is not a valid SQL key, so it returns a "not a valid table key" from EnsureSafeSplitKey
6. error from EnsureSafeSplitKey is ignored
7. load-based splitter splits at "g/0".Next(), between "g/0" and "g/1"

Jira issue: CRDB-28024

why isn't this a problem for size-based splits, which use the same logic but don't ignore the error? ↩

The text was updated successfully, but these errors were encountered:

knz · 2023-05-17T07:58:34Z

why isn't this a problem for size-based splits, which use the same logic but don't ignore the error?

My guess is because these ranges don't grow to exceed the size threshold?

kvoli · 2023-05-17T22:41:33Z

One interesting thing I have noticed is that keys.EnsureSafeSplitKey(keys.EnsureSafeSplitKey(key)) will return an error as we strip off the column family ID suffix (including length) - however expect it to be present in the key to not return an error:

cockroach/pkg/keys/keys.go

Lines 942 to 952 in 5af7d05

    
           if colFamIDLen > uint64(sqlN-1) { 
        
           	// The column family ID length was impossible. colFamIDLen is the length 
        
           	// of the encoded column family ID suffix. We add 1 to account for the 
        
           	// byte holding the length of the encoded column family ID and if that 
        
           	// total (colFamIDLen+1) is greater than the key suffix (sqlN == 
        
           	// len(sqlKey)) then we bail. Note that we don't consider this an error 
        
           	// because EnsureSafeSplitKey can be called on keys that look like table 
        
           	// keys but which do not have a column family ID length suffix (e.g by 
        
           	// SystemConfig.ComputeSplitKey). 
        
           	return 0, errors.Errorf("%s: malformed table key", key) 
        
           }

So the safe split key function is not idempotent. We should separate split key validation from creating a safe key to split at, although perhaps we cannot always do so.

kvoli · 2023-05-18T00:34:22Z

We added split key sanitization (a call to keys.EnsureSafeSplitKey) to the load-based splitter in fad2024. Shortly after, we added logic to ignore errors from such sanitation in 1d5eb7c, which was important to allow load-based splits in non-SQL keyspaces 1.

The commit to ignore errors from EnsureSafeSplitKey linked a roachtest failure running a uniform KV workload #42903, which the commit resolved. I removed the logic which ignores the EnsureSafeSplitKey errors in the decider on a commit off master to see if this was still relevant. There were over 200 splits created¹, which is expected and does not reproduce the failure. I failed at building a commit which produced the test failures to see the cause, I'm still curious what led the tests to fail.

Details

diff --git a/pkg/kv/kvserver/split/decider.go b/pkg/kv/kvserver/split/decider.go
index ac099825744..71f27aa5809 100644
--- a/pkg/kv/kvserver/split/decider.go
+++ b/pkg/kv/kvserver/split/decider.go
@@ -329,6 +329,8 @@ func (d *Decider) MaybeSplitKey(ctx context.Context, now time.Time) roachpb.Key
    key = d.mu.splitFinder.Key()
    if safeKey, err := keys.EnsureSafeSplitKey(key); err == nil {
      key = safeKey
+   } else {
+     key = nil
    }
  }
  return key

I also checked to see whether non-sql ranges were affected by EnsureSafeSplitKey check, they are not and it is a no-op.

I tested this with tsd keys initially

encoded=/System/tsd/test.metric/10s/1970-01-01T00:00:00Z/testsource
safe(encoded)=/System/tsd/test.metric/10s/1970-01-01T00:00:00Z/testsource
x_encoded=0x04 0x74 0x73 0x64 0x12 0x74 0x65 0x73 0x74 0x2e 0x6d 0x65 0x74 0x72 0x69 0x63 0x00 0x01 0x89 0x88 0x74 0x65 0x73 0x74 0x73 0x6f 0x75 0x72 0x63 0x65
x_safe(encoded)=0x04 0x74 0x73 0x64 0x12 0x74 0x65 0x73 0x74 0x2e 0x6d 0x65 0x74 0x72 0x69 0x63 0x00 0x01 0x89 0x88 0x74 0x65 0x73 0x74 0x73 0x6f 0x75 0x72 0x63 0x65

EnsureSafeSplitKey peeks the type of the value encoded at the start of the key after stripping the tenant prefix, if it exists. If the type is not an Int, represented by a byte between 0x80 and 0xfd, then EnsureSafeSplitKey returns the original key and no error i.e. a no-op.

cockroach/pkg/keys/keys.go

Lines 897 to 901 in a0e80d2

    
           // Check that the prefix contains a valid TableID. 
        
           if encoding.PeekType(sqlKey) != encoding.Int { 
        
           	// Not a table key, so the row prefix is the entire key. 
        
           	return n, nil 
        
           }

Meta1 0x02 OK
Meta2 0x03 OK
System 0x04 OK

It appears reasonable to take an approach similar to the weighted splitter, where only safe sampled keys are retained. Since there isn't a risk that system ranges will no longer load based split. There's still a risk that no safe key ever gets retained, however we have been testing this on master/23.1 and have yet to find issues.

cockroach/pkg/kv/kvserver/split/weighted_finder.go

Lines 127 to 135 in a0e80d2

    
           // We only wish to retain safe split keys as samples, as they are the split 
        
           // keys that will eventually be returned from Key(). If instead we kept every 
        
           // key, it is possible for all sample keys to map to the same split key 
        
           // implicitly with column families. Note this doesn't stop every sample being 
        
           // the same key, however it will cause no split key logging and bump metrics. 
        
           // TODO(kvoli): When the single key situation arises, we should backoff 
        
           // attempting to split. There is a fixed overhead on the hotpath when the 
        
           // finder is active. 
        
           if safeKey, err := keys.EnsureSafeSplitKey(key); err == nil {

split/load roachtests set the QPS split threshold to 100, this hasn't changed since that test failure. The default is currently 2500. ↩

nvanbenschoten · 2023-05-18T03:27:34Z

The commit to ignore errors from EnsureSafeSplitKey linked a roachtest failure running a uniform KV workload #42903, which the commit resolved. I removed the logic which ignores the EnsureSafeSplitKey errors in the decider on a commit off master to see if this was still relevant. There were over 200 splits created1, which is expected and does not reproduce the failure.

It's interesting that you were unable to reproduce the issue. Your previous comment has me wondering — would we be hitting the issue if kv was querying rows using keys without the (single) column family suffix? It sounds like it would.

So I wonder if a change in SQL-assigned request keys might explain this difference in behavior between 2019 and now. For instance, at some point in there, we started issuing GetRequests for single-row, single-cf reads ("point lookups"). Perhaps that change also required us to get more strict about correctly specifying column family suffixes when reading from KV?

The following script isn't exactly proof of that, but it does demonstrate that we are more precise with specifying the column family suffix for GetRequests than we are with ScanRequests.

create table kv (k int primary key, v int);
create table kv2 (k int primary key, v int, family (k), family (v));


set tracing = on; select * from kv where k = 5; set tracing = off;
select message from [show trace for session] where message ~ 'executing (Get|Scan)';
                                  message
----------------------------------------------------------------------------
  executing Get [/Table/106/1/5/0,/Min), [txn: 88024b8f], [can-forward-ts]


set tracing = on; select * from kv where k between 4 and 5; set tracing = off;
select message from [show trace for session] where message ~ 'executing (Get|Scan)';
                                       message
-------------------------------------------------------------------------------------
  executing Scan [/Table/106/1/4,/Table/106/1/6), [txn: 8edc306f], [can-forward-ts]


set tracing = on; select * from kv2 where k = 5; set tracing = off;
select message from [show trace for session] where message ~ 'executing (Get|Scan)';
                                       message
-------------------------------------------------------------------------------------
  executing Scan [/Table/110/1/5,/Table/110/1/6), [txn: 96897cc7], [can-forward-ts]

Previously, the load based range splitter could suggest split keys which were in-between SQL rows. This violated assumptions made in SQL code, which require that rows are never split across ranges. This patch updates the unweighted split finder to only retain safe sample keys. The weighted split finder already does this (e4f003b). Note that the weighted split finder is used by default (>=23.1), whilst the unweighted split finder is used when `kv.allocator.load_based_rebalancing.objective` is set to `qps` (default `cpu`). Informs: cockroachdb#43094 Fixes: cockroachdb#103483 Release note: None

kvoli · 2023-05-19T04:04:02Z

An update. The initial approach (only retain safe samples) could lead to the load based splitter never finding a split. This issue also affects the weighted finder (23.1, not 23.1.0), which already uses this approach. I will open an issue tomorrow.

There are two (known) key patterns which will cause the range to never load based split.

Table key without a column family ID length: /Table/1/1/pks. EnsureSafeSplitKey marks the key as invalid, even though it is a safe split point. Because there can be an arbitrary number of primary keys, we can't determine if the portion after the table/index is a safe key, without column families or contains column families and unsafe.
Table key with one or more .Next() 0x00 appended. The key's last byte determines the column family ID length. 0x00 is special cased to indicate no column family.

The combination means we can't determine safeness with just the sampled key without excluding potentially safe keys.

When a range only receives these types of requests it will never split for load, even if it satisfies the load criteria. The proportion (bad/all) keys in requesrs determines the odds we find enougn safe samples to split. Note there haven't been any performance regressions or test failures caused by this.

Given we cannot avoid unsafe split keys without also excluding table keys with no column family ID, the best approach I could think of was to stop trying entirely in the load based splitter.

Instead, we can update the local admin split command to take a flag indicating the provided split key is potentially unsafe. The command then checks the safeness, if unsafe it will iterate to the first safe key in [providedKey, range.endKey). This ensures we only use safe keys, without the risk of never splitting (see below). A version gate or protocol change are not required, so a patch could be backported.

I drafted a patch doing the above and am running the split tests overnight, as well as tests involving the problem keys at high %.

The change isn't without potential risks. The risks boil down to not splitting the load on a range. There is no risk of creating an unsafe split.

Provided key is for a single hot row but > the safe key for the row. The safe split will be after the row. The lhs split range may never reduce its load.
No keys in the range, the range is heavily loaded for some other reason. It won't be split.
Table range with no keys which satisfy EnsureSafeSplitKey. Same issue, never split. Note system ranges trivially pass the check so are unaffected.

Risk 2 and 3 would also arise for size based splits, they both appear unlikely.

Risk 1 appears more likely. We could iterate left once to check the key before doing the above iteration. This has issues when we wish to split at the 2nd row in a range.

Previously, there was no way to peak the contents of the load based splitter samples when inspecting nodes. This commit adds string methods for the `UnweightedFinder`, `WeightedFinder` and `Decider`. This commit also swaps the order of the should split check to avoid computation. As a result the output of `cpu_decider_cartesian` changed slightly as the no split key loggin message is now ordered differently. Informs: cockroachdb#103672 Informs: cockroachdb#103483 Release note: None

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with just the sampled request keys, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to the next real key, after the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. As such, all safe split key checks are also removed from the `split` pkg, with a warning added. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now sample the real keys, rather than just request keys to determine load-based split points.

Previously, there was no way to peak the contents of the load based splitter samples when inspecting nodes. This commit adds string methods for the `UnweightedFinder`, `WeightedFinder` and `Decider`. This commit also swaps the order of the should split check to avoid computation. As a result the output of `cpu_decider_cartesian` changed slightly as the no split key logging message is now ordered differently. Informs: cockroachdb#103672 Informs: cockroachdb#103483 Release note: None

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

Previously, there was no way to peak the contents of the load based splitter samples when inspecting nodes. This commit adds string methods for the `UnweightedFinder`, `WeightedFinder` and `Decider`. This commit also swaps the order of the should split check to avoid computation. As a result the output of `cpu_decider_cartesian` changed slightly as the no split key logging message is now ordered differently. Informs: cockroachdb#103672 Informs: cockroachdb#103483 Release note: None

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

Previously, there was no way to peak the contents of the load based splitter samples when inspecting nodes. This commit adds string methods for the `UnweightedFinder`, `WeightedFinder` and `Decider`. This commit also swaps the order of the should split check to avoid computation. As a result the output of `cpu_decider_cartesian` changed slightly as the no split key logging message is now ordered differently. Informs: cockroachdb#103672 Informs: cockroachdb#103483 Release note: None

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

103690: kvserver: avoid load based splits in middle of SQL row r=nvanbenschoten a=kvoli It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with just the sampled request keys, whether a key is certainly unsafe or safe, so a split key is returned regardless of error. This PR side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Note that the weighted load based split finder, used for CPU splits did not suffer from returning potentially unsafe splits due to e4f003b. However, it was possible that no load-based split key was ever found when using the weighted finder. This was because we discard potentially unsafe samples, which could have been safe split points. This PR reverts commit e4f003b, as the safe split key is enforced elsewhere, mentioned above. Resolves: #103483 Resolves: #103672 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now sample the real keys, rather than just request keys to determine load-based split points. Co-authored-by: Austen McClernon <[email protected]>

Previously, there was no way to peak the contents of the load based splitter samples when inspecting nodes. This commit adds string methods for the `UnweightedFinder`, `WeightedFinder` and `Decider`. This commit also swaps the order of the should split check to avoid computation. As a result the output of `cpu_decider_cartesian` changed slightly as the no split key logging message is now ordered differently. Informs: #103672 Informs: #103483 Release note: None

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: #103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

Previously, there was no way to peak the contents of the load based splitter samples when inspecting nodes. This commit adds string methods for the `UnweightedFinder`, `WeightedFinder` and `Decider`. This commit also swaps the order of the should split check to avoid computation. As a result the output of `cpu_decider_cartesian` changed slightly as the no split key logging message is now ordered differently. Informs: #103672 Informs: #103483 Release note: None

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: #103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

kvoli · 2023-06-06T02:03:48Z

Backport tracking issue:
#104353

It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.

kvoli mentioned this issue May 18, 2023

split: [DNM] only retain safe split keys #103565

Closed

nvanbenschoten assigned kvoli May 18, 2023

kvoli mentioned this issue May 19, 2023

split: weighted key finder may never return key for table ranges #103672

Closed

kvoli mentioned this issue May 20, 2023

kvserver: avoid load based splits in middle of SQL row #103690

Merged

craig bot closed this as completed in bf2aa42 May 25, 2023

blathers-crl bot mentioned this issue May 25, 2023

release-23.1: kvserver: avoid load based splits in middle of SQL row #103876

Merged

kvoli mentioned this issue Jun 5, 2023

kv: load-based splitter permits splits between SQL rows backport tracker #104353

Closed

3 tasks

kvoli mentioned this issue Jun 7, 2023

release-22.2: kvserver: avoid load based splits in middle of SQL row #104563

Merged

michae2 mentioned this issue Dec 21, 2023

sql,storage: splits may split SQL rows #43094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: load-based splitter permits splits between SQL rows, given non-SQL row inputs #103483

kv: load-based splitter permits splits between SQL rows, given non-SQL row inputs #103483

nvanbenschoten commented May 16, 2023 •

edited

Loading

knz commented May 17, 2023

kvoli commented May 17, 2023

kvoli commented May 18, 2023 •

edited

Loading

nvanbenschoten commented May 18, 2023

kvoli commented May 19, 2023 •

edited

Loading

kvoli commented Jun 6, 2023

kv: load-based splitter permits splits between SQL rows, given non-SQL row inputs #103483

kv: load-based splitter permits splits between SQL rows, given non-SQL row inputs #103483

Comments

nvanbenschoten commented May 16, 2023 • edited Loading

Footnotes

knz commented May 17, 2023

kvoli commented May 17, 2023

kvoli commented May 18, 2023 • edited Loading

Footnotes

nvanbenschoten commented May 18, 2023

kvoli commented May 19, 2023 • edited Loading

kvoli commented Jun 6, 2023

nvanbenschoten commented May 16, 2023 •

edited

Loading

kvoli commented May 18, 2023 •

edited

Loading

kvoli commented May 19, 2023 •

edited

Loading