optimize: add a EnsureSafeSplitKey in Decider #42056

maweike · 2019-10-31T06:44:34Z

For example, the data key is /Table/68/1/10. The endkey boundary of the range is exactly /Table/68/1/10/0. The previous range does not contain /Table/68/1/10/0, the latter one. Range contains. But the scan request sends startKey /Table/68/1/10, endKey /Table/68/1/10/0 in the previous range. The key of the put request is /Table/68/1/10/0. In the next range, the read and write is separated.

Release note: None

cockroach-teamcity · 2019-10-31T06:44:40Z

This change is

CLAassistant · 2019-10-31T06:45:04Z

All committers have signed the CLA.

Commit: It is found that the split and read-write are separated when the transaction is concurrent. For example, the data key is /Table/68/1/10. The endkey boundary of the range is exactly /Table/68/1/10/0. The previous range does not contain /Table/68/1/10/0, the next range is included. But the scan request sends startKey /Table/68/1/10, endKey /Table/68/1/10/0 in the previous range. The key of the put request is /Table/68/1/10/0. In the latter range, the read and write is separated, so it is optimized in the decider.go. In addition to optimizing the split, the test found that the panic of #41630 has not been triggered so far, so guessing that this modification may also fix the problem described by #39794 #36834 #36356. Fixes #39794 #36834 #36356 Release note (bug fix): Modified code in #42056 submission.

tbg · 2019-11-07T11:32:25Z

@petermattis, mind taking a look at this? I don't quite understand the description, but the code suggests that load-based splits may return a split key that falls in the middle of a SQL row. As far as I can tell, we want to call EnsureSafeSplitKey only on SQL keys, but load based splitting is active on all ranges, so I'm not sure whether this addition will backfire in some way. Looking at the split path, I find no indication that we'll somehow massage the key further to avoid splitting inside the row (i.e. I'd expect it to split between columns of the same row). And yet there's a test that claims the split makes sure that, even when you pass it a key between two columns of the same row, it won't actually split there 😕:

cockroach/pkg/storage/client_split_test.go

Lines 280 to 282 in 8b9f547

    
           // TestStoreRangeSplitInsideRow verifies an attempt to split a range inside of 
        
           // a table row will cause a split at a boundary between rows. 
        
           func TestStoreRangeSplitInsideRow(t *testing.T) {

I'm also unsure why we don't allow rows to be split between columns. I get that we want to avoid it, but is this fixing a serious bug or just a silly thing that can happen? I found no answers in

cockroach/pkg/storage/engine/mvcc.go

Lines 2995 to 2998 in 35bdbf0

    
           // In addition, we must never return a split key that falls within a table 
        
           // row. (Rows in tables with multiple column families are comprised of 
        
           // multiple keys, one key per column family.) However, we do allow a split 
        
           // key that falls between a row and its interleaved rows.

.

petermattis

I'm not sure what badness will happen from splitting inside a row. We do call keys.EnsureSafeSplitKey on AdminSplit requests when no split-point is provided (see MVCCFindSplitKey). Rather than making a change inside the split decider, I'd do this enforcement in Replica.adminSplitWithDescriptor.

@danhhz Do you recall if there are correctness problems from splitting in the middle of row? This can only happen when there are multiple column families (I think).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @tbg)

danhhz · 2019-11-07T18:57:11Z

@danhhz Do you recall if there are correctness problems from splitting in the middle of row? This can only happen when there are multiple column families (I think).

Indeed it could only happen if there are multiple column families in a row, but note that (edit: most) system tables (even in new clusters) still have 1 column family per row for backward compatibility.

I recall at some point that SQL depended on this, but I've long forgotten where that was and it's reasonably likely that's changed anyway. No idea who else to tag in here... @jordanlewis do you know who is most up to date on how sql execution uses column families?

Also cc @dt does bulk io care if a row is split between two ranges?

jordanlewis · 2019-11-07T18:58:03Z

Column families? I'll continue this tag-fest by saying that that's a @rohany beacon if I ever did hear one!

dt · 2019-11-07T20:21:20Z

BulkIO counts rows by checking when EnsureSafeSplitKey changes so it is assuming that it will never split mid-row. Even if we re-implemented the counter logic to what EnsureSafeSplitKey was doing before, if a row is in two different ranges, it'd be double-counted when we backup/restore/etc those ranges. Those counts are mostly for user info so that might not be the end of the world, but it'd certainly cause some confusion.

I'm not sure if there are any other places we bake that assumption in -- there easily could be. Uniqueness enforcement comes to mind: I think we rely on the fact that the default family will collide to ensure we don't inject a non-unique row, and I'm not quite sure how much we can use that assumption if the default family KV might not be in the same SST/range.

rohany · 2019-11-08T00:06:46Z

Aside from relying on the uniqueness checks upon inserting, I don't think that SQL really cares what ranges the different families live in. The rowfetcher/cfetcher just put together the rows from the k/v family pieces as the underlying fetchers give them kv's without caring what range they come from.

petermattis · 2019-11-08T01:46:37Z

Aside from relying on the uniqueness checks upon inserting

The uniqueness checks on inserting don't depend on a row being contained in a single range, right?

rohany · 2019-11-08T02:12:23Z

woops -- I saw part of davids comment above and got confused. It doesn't depend on the range at all.

maweike · 2019-11-08T07:03:52Z

Fetchers appeared before because of the column families problem, it is similar to this issue：#38577. we try to avoid issue 38577, when the table is built, it is a single column family. but the panic of unset EncDatum will still appear in the poc test. I just turned off splitbyload when testing, After that, this panic problem no longer appears. You can find that the normal split key does not end with /0 by “select * from crdb_internal.ranges；”command. However, when spiltbyload is turned on, there is a splitkey at the end of /0, so I think it may be related to the key of splitbyload.

tbg · 2019-11-08T09:32:28Z

@maweike, this seems like you have a way to reproduce this at will, would you be able to share the details of your reproduction with us? From what you're saying it seems likely that the load based split is a problem, but I want to make sure we understand the implications of it all the way through the stack.

@petermattis it's worth pointing out that AdminSplit initially had EnsureSafeSplitKey, but that this was deliberately undone: #16344. @bdarnell was driving this, so he should chime in.

At the very least, we ought to prevent AdminSplit from splitting between rows. It should probably get the same "is this SQL" detection that MVCCFindSplitKey has, and make the split key safe only when it's cutting through a SQL row. We'll need to be careful to not make this situation more confusing than it already is.

But this is also a good opportunity to figure out what causes the problems of a mid-row split at the SQL layer. It seems that nobody can tell why it would be severely problematic to do so.

tbg · 2019-11-08T09:34:00Z

Oh - and this test I was looking at

cockroach/pkg/storage/client_split_test.go

Line 282 in 8b9f547

func TestStoreRangeSplitInsideRow(t *testing.T) {

actually calls EnsureSafeSplitKey on the thing it passes to AdminSplit, so it's not doing what it claims is doing. This "broke" when #16344 was closed: col1Key is identical to col2Key. We need to update this test when we fix the behavior of AdminSplit so that it tests something again.

maweike · 2019-11-08T11:20:53Z

@tbg The scenario we simulated was a commercial scenario under stress testing,and the test software used was an intranet commercial software, so I am sorry that I can't share the detailed process of reproduction, but I can confirm that the problem occurred when the data that I am reading is at the boundary of the split range, which will cause the range of reads and writes to be inconsistent, but when the data I read is inside the range, it will not cause the read and write range to be inconsistent.

tbg · 2019-11-11T10:47:59Z

@rohany can you try to reverse engineer the problem @maweike is seeing? We're certainly not intending to split between column families, and we'll fix that. But it seems that we have no idea what exactly the issue actually is and we should get to the bottom of that.

dt · 2019-11-11T12:47:06Z

@rohany Sorry, I meant uniqueness enforcement in BulkIO and specifically IMPORT INTO's direct injection of rows, and what happens when the default family KV gets put in a different SST than the rest of the row -- that wasn't something considered when we built it. FWIW, I think it should actually be OK since we'd still get the collision later when we go to add that SST and still rollback.

The row counter will still certainly be wrong though.

rohany · 2019-11-11T15:22:36Z

I'll take a look at it today or tomorrow!

rohany · 2019-11-12T16:24:47Z

So i have a repro of the panic --
First add the following line into pkg/sql/split.go

	rowKey = keys.MakeFamilyKey(rowKey, 1)

at line 73 on current master (right after rowKey gets computed by getRowKey. This forces split ats to split at family 1 instead of on the row boundary. Then,

./cockroach demo --empty --nodes 3
create table t (x int, y int not null, z int not null, primary key (x), family (x), family (y), family (z));
alter table t split at values (1), (2), (3);
insert into t values (1, 1, 1);
(* wait 30 seconds to a minute *)
select * from t; (* this will panic with the fetcher error (found null on not null column) *)

I have no clue why this happens though. I'm not sure why you have to wait for the panic to occur -- are the range splits happening asynchronously? show ranges immediately after the split call shows multiple ranges, so I'm not sure what waiting for 30 seconds is doing.

rohany · 2019-11-12T16:55:52Z

Ok, the cause for the panic is pretty simple -- it seems that the fetchers assume that when a range boundary is hit, all the kv's for a row have been retrieved already. They don't explicitly make this assumption, but it comes out here in pkg/sql/row/fetcher.go:

func (rf *Fetcher) NextKey(ctx context.Context) (rowDone bool, err error) {
	var ok bool

	for {
		ok, rf.kv, _, err = rf.kvFetcher.NextKV(ctx)
		if err != nil {
			return false, err
		}
		rf.kvEnd = !ok
		fmt.Println("nextkv", rf.kv, rf.kvEnd)
		if rf.kvEnd {
			// No more keys in the scan. We need to transition
			// rf.rowReadyTable to rf.currentTable for the last
			// row.
			rf.rowReadyTable = rf.currentTable
			return true, nil
		}

The underlying KVBatchFetcher returns false as the first argument when it hits the end of a range, which causes the fetcher to think that this row is done being processed -- a log on the above example confirms this:

nextkv {/Table/52/1/1/1/1 {[118 80 228 138 1 2] 0.000000000,0}} false
nextkv {/Table/52/1/1/2/1 {[100 229 75 100 1 2] 0.000000000,0}} false
nextkv {/Min {[] 0.000000000,0}} true
nextkv {/Table/52/1/1/0 {[126 47 48 235 10] 0.000000000,0}} false
nextkv {/Min {[] 0.000000000,0}} true

Here, the first part of the row /Table/52/1/1/0 is separated in a different range from the other K/V's in its row, and is processed independently, leading to the panic.

danhhz · 2019-11-12T16:58:24Z

Aha! This is exactly what I was thinking of. I wish I would have remembered the specifics and saved you some time, but thanks for looking into it!

bdarnell · 2019-11-12T19:37:43Z

But this is also a good opportunity to figure out what causes the problems of a mid-row split at the SQL layer. It seems that nobody can tell why it would be severely problematic to do so.

IIRC there was not originally a severe issue here. We avoid splitting mid-row because it might make it harder to count rows (you can no longer to per-range counts and sum them), and it's a likely performance problem (since the column families of a single row are much more likely to be updated in the same transaction than neighboring rows. It also makes things much harder to explain; today we can say that single-row PK-only writes never need distributed transactions).

it's worth pointing out that AdminSplit initially had EnsureSafeSplitKey, but that this was deliberately undone: #16344. @bdarnell was driving this, so he should chime in.

EnsureSafeSplitKey is kind of a crude instrument - it's meant to be applied to keys that exist; applying it to keys that don't exist can sometimes produce odd results even though these keys may be safe split keys themselves (the main manifestation of this is off-by-one issues creating additional single-key ranges in between desired split points). We shifted responsibility for the "safety" of the split key to the caller of AdminSplit. It would also be possible to fix this in the other direction by improving EnsureSafeSplitKey to always be safe and idempotent, then doing it in AdminSplit instead of the caller.

So 👍 to @maweike 's change, although I'd like to see a new/updated test that demonstrates the problem this solves.

We need to update this test when we fix the behavior of AdminSplit so that it tests something again.

Or if we keep AdminSplit as-is we should just delete the test.

tbg · 2019-11-27T16:50:23Z

@bdarnell I'd rather fix this in AdminSplit if we can but it looks like you're not so optimistic that this is possible. Can we tell from looking at the key itself whether it is a column family KV pair? I assume we can't, right? If the table is really /Table/5/N/N/N/<colfam> and we issue a split for /Table/5/N/N/3 then EnsureSafeSplitKey has no way of knowing that the 3 is not a column family marker and should not be stripped.

applying it to keys that don't exist can sometimes produce odd results even though these keys may be safe split keys themselves (the main manifestation of this is off-by-one issues creating additional single-key ranges in between desired split points).

If the int encoded at the end of the key is zero, it can't split a row and we don't need to massage it, right?

Taking a step back, I'm quite unhappy about the state of affairs here. We need to uphold some unclear invariant in KV and have no real way of doing so. Any new use of AdminSplit potentially throws us under the bus again. We can't check that a given key to split at exists because SPLIT AT VALUES(...) does not actually split at existing keys, it's just that those keys could exist.

It does seem that this PR is actually the reasonable thing to do here in the short term, though I'd like to explore what can be done here more generally. Not seeing many options, though.

tbg · 2019-11-27T18:38:56Z

I brushed up this PR over in #42833.

bdarnell · 2019-11-27T19:44:17Z

I'd rather fix this in AdminSplit if we can but it looks like you're not so optimistic that this is possible.

I'm neither optimistic nor pessimistic here - I don't have any of the relevant encoding details paged in. I agree that we should be able to statically construct an appropriate split key.

Can we tell from looking at the key itself whether it is a column family KV pair? I assume we can't, right? If the table is really /Table/5/N/N/N/ and we issue a split for /Table/5/N/N/3 then EnsureSafeSplitKey has no way of knowing that the 3 is not a column family marker and should not be stripped.

Right, this is the problem.

If the int encoded at the end of the key is zero, it can't split a row and we don't need to massage it, right?

That sounds right. So appending a zero instead of removing a component should result in a "safe" split key. There's a possible risk of unhelpfully producing splits at both K and K+'/0'. Maybe that's not really a concern now that we have merging, though.

We could end up splitting between column families of the same row, which is illegal. Unfortunately the KV layer has to uphold invariants here that it doesn't quite have introspection into, but after this commit it hopefully stops breaking them. See cockroachdb#16344 for some additional history. Possibly the solution for cockroachdb#39794. Possibly the solution for cockroachdb#36834. Possibly the solution for cockroachdb#36356. (Intentionally not closing the above; leaving that to the SQL folks). Closes cockroachdb#42056 (which is the go-to for reading up on this issue). Release note (bug fix): prevent a number of panics from the SQL layer caused by an invalid range split. These would usually manifest with messages mentioning encoding errors ("found null on not null column" but also possibly various others).

42833: storage: call EnsureSafeSplitKey during load-based splits r=bdarnell a=tbg We could end up splitting between column families of the same row, which is illegal. Unfortunately the KV layer has to uphold invariants here that it doesn't quite have introspection into, but after this commit it hopefully stops breaking them. See #16344 for some additional history. Possibly the solution for #39794. Possibly the solution for #36834. Possibly the solution for #36356. (Intentionally not closing the above; leaving that to the SQL folks). Closes #42056 (which is the go-to for reading up on this issue). Release note (bug fix): prevent a number of panics from the SQL layer caused by an invalid range split. These would usually manifest with messages mentioning encoding errors ("found null on not null column" but also possibly various others). Co-authored-by: Tobias Schottdorf <[email protected]>

We could end up splitting between column families of the same row, which is illegal. Unfortunately the KV layer has to uphold invariants here that it doesn't quite have introspection into, but after this commit it hopefully stops breaking them. See cockroachdb#16344 for some additional history. Possibly the solution for cockroachdb#39794. Possibly the solution for cockroachdb#36834. Possibly the solution for cockroachdb#36356. (Intentionally not closing the above; leaving that to the SQL folks). Closes cockroachdb#42056 (which is the go-to for reading up on this issue). Release note (bug fix): prevent a number of panics from the SQL layer caused by an invalid range split. These would usually manifest with messages mentioning encoding errors ("found null on not null column" but also possibly various others).

jordanlewis requested a review from tbg November 7, 2019 05:56

tbg requested a review from petermattis November 7, 2019 11:32

petermattis reviewed Nov 7, 2019

View reviewed changes

tbg mentioned this pull request Nov 27, 2019

storage: call EnsureSafeSplitKey during load-based splits #42833

Merged

tbg mentioned this pull request Nov 29, 2019

backport-19.2: storage: call EnsureSafeSplitKey during load-based splits #42859

Merged

tbg mentioned this pull request Nov 29, 2019

backport-19.1: storage: call EnsureSafeSplitKey during load-based splits #42860

Merged

craig bot closed this in fad2024 Dec 2, 2019

tbg mentioned this pull request Dec 10, 2019

sql,storage: splits may split SQL rows #43094

Open

irfansharif mentioned this pull request Apr 9, 2020

error about null value in jobs table during SHOW JOBS #36851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize: add a EnsureSafeSplitKey in Decider #42056

optimize: add a EnsureSafeSplitKey in Decider #42056

maweike commented Oct 31, 2019

cockroach-teamcity commented Oct 31, 2019

CLAassistant commented Oct 31, 2019 •

edited

Loading

tbg commented Nov 7, 2019

petermattis left a comment

danhhz commented Nov 7, 2019 •

edited

Loading

jordanlewis commented Nov 7, 2019

dt commented Nov 7, 2019 •

edited

Loading

rohany commented Nov 8, 2019 •

edited by irfansharif

Loading

petermattis commented Nov 8, 2019

rohany commented Nov 8, 2019

maweike commented Nov 8, 2019 •

edited

Loading

tbg commented Nov 8, 2019

tbg commented Nov 8, 2019

maweike commented Nov 8, 2019

tbg commented Nov 11, 2019

dt commented Nov 11, 2019

rohany commented Nov 11, 2019

rohany commented Nov 12, 2019 •

edited

Loading

rohany commented Nov 12, 2019

danhhz commented Nov 12, 2019

bdarnell commented Nov 12, 2019

tbg commented Nov 27, 2019

tbg commented Nov 27, 2019 via email

bdarnell commented Nov 27, 2019

optimize: add a EnsureSafeSplitKey in Decider #42056

optimize: add a EnsureSafeSplitKey in Decider #42056

Conversation

maweike commented Oct 31, 2019

cockroach-teamcity commented Oct 31, 2019

CLAassistant commented Oct 31, 2019 • edited Loading

tbg commented Nov 7, 2019

petermattis left a comment

Choose a reason for hiding this comment

danhhz commented Nov 7, 2019 • edited Loading

jordanlewis commented Nov 7, 2019

dt commented Nov 7, 2019 • edited Loading

rohany commented Nov 8, 2019 • edited by irfansharif Loading

petermattis commented Nov 8, 2019

rohany commented Nov 8, 2019

maweike commented Nov 8, 2019 • edited Loading

tbg commented Nov 8, 2019

tbg commented Nov 8, 2019

maweike commented Nov 8, 2019

tbg commented Nov 11, 2019

dt commented Nov 11, 2019

rohany commented Nov 11, 2019

rohany commented Nov 12, 2019 • edited Loading

rohany commented Nov 12, 2019

danhhz commented Nov 12, 2019

bdarnell commented Nov 12, 2019

tbg commented Nov 27, 2019

tbg commented Nov 27, 2019 via email

bdarnell commented Nov 27, 2019

CLAassistant commented Oct 31, 2019 •

edited

Loading

danhhz commented Nov 7, 2019 •

edited

Loading

dt commented Nov 7, 2019 •

edited

Loading

rohany commented Nov 8, 2019 •

edited by irfansharif

Loading

maweike commented Nov 8, 2019 •

edited

Loading

rohany commented Nov 12, 2019 •

edited

Loading