Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize: add a EnsureSafeSplitKey in Decider #42056

Closed
wants to merge 1 commit into from
Closed

optimize: add a EnsureSafeSplitKey in Decider #42056

wants to merge 1 commit into from

Conversation

maweike
Copy link

@maweike maweike commented Oct 31, 2019

For example, the data key is /Table/68/1/10. The endkey boundary of the range is exactly /Table/68/1/10/0. The previous range does not contain /Table/68/1/10/0, the latter one. Range contains. But the scan request sends startKey /Table/68/1/10, endKey /Table/68/1/10/0 in the previous range. The key of the put request is /Table/68/1/10/0. In the next range, the read and write is separated.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Oct 31, 2019

CLA assistant check
All committers have signed the CLA.

Commit:
    It is found that the split and read-write are separated when the transaction is concurrent. For example, the data key is /Table/68/1/10. The endkey boundary of the range is exactly /Table/68/1/10/0. The previous range does not contain /Table/68/1/10/0, the next range is included. But the scan request sends startKey /Table/68/1/10, endKey /Table/68/1/10/0 in the previous range. The key of the put request is /Table/68/1/10/0. In the latter range, the read and write is separated, so it is optimized in the decider.go.
    In addition to optimizing the split, the test found that the panic of #41630 has not been triggered so far, so guessing that this modification may also fix the problem described by #39794 #36834 #36356.

Fixes #39794 #36834 #36356

Release note (bug fix):
Modified code in #42056 submission.
@jordanlewis jordanlewis requested a review from tbg November 7, 2019 05:56
@tbg
Copy link
Member

tbg commented Nov 7, 2019

@petermattis, mind taking a look at this? I don't quite understand the description, but the code suggests that load-based splits may return a split key that falls in the middle of a SQL row. As far as I can tell, we want to call EnsureSafeSplitKey only on SQL keys, but load based splitting is active on all ranges, so I'm not sure whether this addition will backfire in some way. Looking at the split path, I find no indication that we'll somehow massage the key further to avoid splitting inside the row (i.e. I'd expect it to split between columns of the same row). And yet there's a test that claims the split makes sure that, even when you pass it a key between two columns of the same row, it won't actually split there 😕:

// TestStoreRangeSplitInsideRow verifies an attempt to split a range inside of
// a table row will cause a split at a boundary between rows.
func TestStoreRangeSplitInsideRow(t *testing.T) {

I'm also unsure why we don't allow rows to be split between columns. I get that we want to avoid it, but is this fixing a serious bug or just a silly thing that can happen? I found no answers in

// In addition, we must never return a split key that falls within a table
// row. (Rows in tables with multiple column families are comprised of
// multiple keys, one key per column family.) However, we do allow a split
// key that falls between a row and its interleaved rows.
.

@tbg tbg requested a review from petermattis November 7, 2019 11:32
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what badness will happen from splitting inside a row. We do call keys.EnsureSafeSplitKey on AdminSplit requests when no split-point is provided (see MVCCFindSplitKey). Rather than making a change inside the split decider, I'd do this enforcement in Replica.adminSplitWithDescriptor.

@danhhz Do you recall if there are correctness problems from splitting in the middle of row? This can only happen when there are multiple column families (I think).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @tbg)

@danhhz
Copy link
Contributor

danhhz commented Nov 7, 2019

@danhhz Do you recall if there are correctness problems from splitting in the middle of row? This can only happen when there are multiple column families (I think).

Indeed it could only happen if there are multiple column families in a row, but note that (edit: most) system tables (even in new clusters) still have 1 column family per row for backward compatibility.

I recall at some point that SQL depended on this, but I've long forgotten where that was and it's reasonably likely that's changed anyway. No idea who else to tag in here... @jordanlewis do you know who is most up to date on how sql execution uses column families?

Also cc @dt does bulk io care if a row is split between two ranges?

@jordanlewis
Copy link
Member

Column families? I'll continue this tag-fest by saying that that's a @rohany beacon if I ever did hear one!

@dt
Copy link
Member

dt commented Nov 7, 2019

BulkIO counts rows by checking when EnsureSafeSplitKey changes so it is assuming that it will never split mid-row. Even if we re-implemented the counter logic to what EnsureSafeSplitKey was doing before, if a row is in two different ranges, it'd be double-counted when we backup/restore/etc those ranges. Those counts are mostly for user info so that might not be the end of the world, but it'd certainly cause some confusion.

I'm not sure if there are any other places we bake that assumption in -- there easily could be. Uniqueness enforcement comes to mind: I think we rely on the fact that the default family will collide to ensure we don't inject a non-unique row, and I'm not quite sure how much we can use that assumption if the default family KV might not be in the same SST/range.

@rohany
Copy link
Contributor

rohany commented Nov 8, 2019

Aside from relying on the uniqueness checks upon inserting, I don't think that SQL really cares what ranges the different families live in. The rowfetcher/cfetcher just put together the rows from the k/v family pieces as the underlying fetchers give them kv's without caring what range they come from.

@petermattis
Copy link
Collaborator

Aside from relying on the uniqueness checks upon inserting

The uniqueness checks on inserting don't depend on a row being contained in a single range, right?

@rohany
Copy link
Contributor

rohany commented Nov 8, 2019

woops -- I saw part of davids comment above and got confused. It doesn't depend on the range at all.

@maweike
Copy link
Author

maweike commented Nov 8, 2019

Fetchers appeared before because of the column families problem, it is similar to this issue:#38577. we try to avoid issue 38577, when the table is built, it is a single column family. but the panic of unset EncDatum will still appear in the poc test. I just turned off splitbyload when testing, After that, this panic problem no longer appears. You can find that the normal split key does not end with /0 by “select * from crdb_internal.ranges;”command. However, when spiltbyload is turned on, there is a splitkey at the end of /0, so I think it may be related to the key of splitbyload.

@tbg
Copy link
Member

tbg commented Nov 8, 2019

@maweike, this seems like you have a way to reproduce this at will, would you be able to share the details of your reproduction with us? From what you're saying it seems likely that the load based split is a problem, but I want to make sure we understand the implications of it all the way through the stack.

@petermattis it's worth pointing out that AdminSplit initially had EnsureSafeSplitKey, but that this was deliberately undone: #16344. @bdarnell was driving this, so he should chime in.

At the very least, we ought to prevent AdminSplit from splitting between rows. It should probably get the same "is this SQL" detection that MVCCFindSplitKey has, and make the split key safe only when it's cutting through a SQL row. We'll need to be careful to not make this situation more confusing than it already is.

But this is also a good opportunity to figure out what causes the problems of a mid-row split at the SQL layer. It seems that nobody can tell why it would be severely problematic to do so.

@tbg
Copy link
Member

tbg commented Nov 8, 2019

Oh - and this test I was looking at

func TestStoreRangeSplitInsideRow(t *testing.T) {

actually calls EnsureSafeSplitKey on the thing it passes to AdminSplit, so it's not doing what it claims is doing. This "broke" when #16344 was closed: col1Key is identical to col2Key. We need to update this test when we fix the behavior of AdminSplit so that it tests something again.

@maweike
Copy link
Author

maweike commented Nov 8, 2019

@tbg The scenario we simulated was a commercial scenario under stress testing,and the test software used was an intranet commercial software, so I am sorry that I can't share the detailed process of reproduction, but I can confirm that the problem occurred when the data that I am reading is at the boundary of the split range, which will cause the range of reads and writes to be inconsistent, but when the data I read is inside the range, it will not cause the read and write range to be inconsistent.

@tbg
Copy link
Member

tbg commented Nov 11, 2019

@rohany can you try to reverse engineer the problem @maweike is seeing? We're certainly not intending to split between column families, and we'll fix that. But it seems that we have no idea what exactly the issue actually is and we should get to the bottom of that.

@dt
Copy link
Member

dt commented Nov 11, 2019

@rohany Sorry, I meant uniqueness enforcement in BulkIO and specifically IMPORT INTO's direct injection of rows, and what happens when the default family KV gets put in a different SST than the rest of the row -- that wasn't something considered when we built it. FWIW, I think it should actually be OK since we'd still get the collision later when we go to add that SST and still rollback.

The row counter will still certainly be wrong though.

@rohany
Copy link
Contributor

rohany commented Nov 11, 2019

I'll take a look at it today or tomorrow!

@rohany
Copy link
Contributor

rohany commented Nov 12, 2019

So i have a repro of the panic --
First add the following line into pkg/sql/split.go

	rowKey = keys.MakeFamilyKey(rowKey, 1)

at line 73 on current master (right after rowKey gets computed by getRowKey. This forces split ats to split at family 1 instead of on the row boundary. Then,

./cockroach demo --empty --nodes 3
create table t (x int, y int not null, z int not null, primary key (x), family (x), family (y), family (z));
alter table t split at values (1), (2), (3);
insert into t values (1, 1, 1);
(* wait 30 seconds to a minute *)
select * from t; (* this will panic with the fetcher error (found null on not null column) *) 

I have no clue why this happens though. I'm not sure why you have to wait for the panic to occur -- are the range splits happening asynchronously? show ranges immediately after the split call shows multiple ranges, so I'm not sure what waiting for 30 seconds is doing.

@rohany
Copy link
Contributor

rohany commented Nov 12, 2019

Ok, the cause for the panic is pretty simple -- it seems that the fetchers assume that when a range boundary is hit, all the kv's for a row have been retrieved already. They don't explicitly make this assumption, but it comes out here in pkg/sql/row/fetcher.go:

func (rf *Fetcher) NextKey(ctx context.Context) (rowDone bool, err error) {
	var ok bool

	for {
		ok, rf.kv, _, err = rf.kvFetcher.NextKV(ctx)
		if err != nil {
			return false, err
		}
		rf.kvEnd = !ok
		fmt.Println("nextkv", rf.kv, rf.kvEnd)
		if rf.kvEnd {
			// No more keys in the scan. We need to transition
			// rf.rowReadyTable to rf.currentTable for the last
			// row.
			rf.rowReadyTable = rf.currentTable
			return true, nil
		}

The underlying KVBatchFetcher returns false as the first argument when it hits the end of a range, which causes the fetcher to think that this row is done being processed -- a log on the above example confirms this:

nextkv {/Table/52/1/1/1/1 {[118 80 228 138 1 2] 0.000000000,0}} false
nextkv {/Table/52/1/1/2/1 {[100 229 75 100 1 2] 0.000000000,0}} false
nextkv {/Min {[] 0.000000000,0}} true
nextkv {/Table/52/1/1/0 {[126 47 48 235 10] 0.000000000,0}} false
nextkv {/Min {[] 0.000000000,0}} true

Here, the first part of the row /Table/52/1/1/0 is separated in a different range from the other K/V's in its row, and is processed independently, leading to the panic.

@danhhz
Copy link
Contributor

danhhz commented Nov 12, 2019

Aha! This is exactly what I was thinking of. I wish I would have remembered the specifics and saved you some time, but thanks for looking into it!

@bdarnell
Copy link
Contributor

But this is also a good opportunity to figure out what causes the problems of a mid-row split at the SQL layer. It seems that nobody can tell why it would be severely problematic to do so.

IIRC there was not originally a severe issue here. We avoid splitting mid-row because it might make it harder to count rows (you can no longer to per-range counts and sum them), and it's a likely performance problem (since the column families of a single row are much more likely to be updated in the same transaction than neighboring rows. It also makes things much harder to explain; today we can say that single-row PK-only writes never need distributed transactions).

it's worth pointing out that AdminSplit initially had EnsureSafeSplitKey, but that this was deliberately undone: #16344. @bdarnell was driving this, so he should chime in.

EnsureSafeSplitKey is kind of a crude instrument - it's meant to be applied to keys that exist; applying it to keys that don't exist can sometimes produce odd results even though these keys may be safe split keys themselves (the main manifestation of this is off-by-one issues creating additional single-key ranges in between desired split points). We shifted responsibility for the "safety" of the split key to the caller of AdminSplit. It would also be possible to fix this in the other direction by improving EnsureSafeSplitKey to always be safe and idempotent, then doing it in AdminSplit instead of the caller.

So 👍 to @maweike 's change, although I'd like to see a new/updated test that demonstrates the problem this solves.

We need to update this test when we fix the behavior of AdminSplit so that it tests something again.

Or if we keep AdminSplit as-is we should just delete the test.

@tbg
Copy link
Member

tbg commented Nov 27, 2019

@bdarnell I'd rather fix this in AdminSplit if we can but it looks like you're not so optimistic that this is possible. Can we tell from looking at the key itself whether it is a column family KV pair? I assume we can't, right? If the table is really /Table/5/N/N/N/<colfam> and we issue a split for /Table/5/N/N/3 then EnsureSafeSplitKey has no way of knowing that the 3 is not a column family marker and should not be stripped.

applying it to keys that don't exist can sometimes produce odd results even though these keys may be safe split keys themselves (the main manifestation of this is off-by-one issues creating additional single-key ranges in between desired split points).

If the int encoded at the end of the key is zero, it can't split a row and we don't need to massage it, right?

Taking a step back, I'm quite unhappy about the state of affairs here. We need to uphold some unclear invariant in KV and have no real way of doing so. Any new use of AdminSplit potentially throws us under the bus again. We can't check that a given key to split at exists because SPLIT AT VALUES(...) does not actually split at existing keys, it's just that those keys could exist.

It does seem that this PR is actually the reasonable thing to do here in the short term, though I'd like to explore what can be done here more generally. Not seeing many options, though.

@tbg
Copy link
Member

tbg commented Nov 27, 2019 via email

@bdarnell
Copy link
Contributor

I'd rather fix this in AdminSplit if we can but it looks like you're not so optimistic that this is possible.

I'm neither optimistic nor pessimistic here - I don't have any of the relevant encoding details paged in. I agree that we should be able to statically construct an appropriate split key.

Can we tell from looking at the key itself whether it is a column family KV pair? I assume we can't, right? If the table is really /Table/5/N/N/N/ and we issue a split for /Table/5/N/N/3 then EnsureSafeSplitKey has no way of knowing that the 3 is not a column family marker and should not be stripped.

Right, this is the problem.

If the int encoded at the end of the key is zero, it can't split a row and we don't need to massage it, right?

That sounds right. So appending a zero instead of removing a component should result in a "safe" split key. There's a possible risk of unhelpfully producing splits at both K and K+'/0'. Maybe that's not really a concern now that we have merging, though.

tbg added a commit to tbg/cockroach that referenced this pull request Nov 29, 2019
We could end up splitting between column families of the same row,
which is illegal. Unfortunately the KV layer has to uphold invariants
here that it doesn't quite have introspection into, but after this
commit it hopefully stops breaking them.

See cockroachdb#16344 for some
additional history.

Possibly the solution for cockroachdb#39794.
Possibly the solution for cockroachdb#36834.
Possibly the solution for cockroachdb#36356.

(Intentionally not closing the above; leaving that to the SQL folks).

Closes cockroachdb#42056 (which is the go-to for reading up on this issue).

Release note (bug fix): prevent a number of panics from the SQL layer
caused by an invalid range split. These would usually manifest with
messages mentioning encoding errors ("found null on not null column" but
also possibly various others).
craig bot pushed a commit that referenced this pull request Dec 2, 2019
42833: storage: call EnsureSafeSplitKey during load-based splits r=bdarnell a=tbg

We could end up splitting between column families of the same row,
which is illegal. Unfortunately the KV layer has to uphold invariants
here that it doesn't quite have introspection into, but after this
commit it hopefully stops breaking them.

See #16344 for some
additional history.

Possibly the solution for #39794.
Possibly the solution for #36834.
Possibly the solution for #36356.

(Intentionally not closing the above; leaving that to the SQL folks).

Closes #42056 (which is the go-to for reading up on this issue).

Release note (bug fix): prevent a number of panics from the SQL layer
caused by an invalid range split. These would usually manifest with
messages mentioning encoding errors ("found null on not null column" but
also possibly various others).

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig craig bot closed this in fad2024 Dec 2, 2019
tbg added a commit to tbg/cockroach that referenced this pull request Dec 3, 2019
We could end up splitting between column families of the same row,
which is illegal. Unfortunately the KV layer has to uphold invariants
here that it doesn't quite have introspection into, but after this
commit it hopefully stops breaking them.

See cockroachdb#16344 for some
additional history.

Possibly the solution for cockroachdb#39794.
Possibly the solution for cockroachdb#36834.
Possibly the solution for cockroachdb#36356.

(Intentionally not closing the above; leaving that to the SQL folks).

Closes cockroachdb#42056 (which is the go-to for reading up on this issue).

Release note (bug fix): prevent a number of panics from the SQL layer
caused by an invalid range split. These would usually manifest with
messages mentioning encoding errors ("found null on not null column" but
also possibly various others).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants