Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: iterator panic during split #104007

Closed
erikgrinaker opened this issue May 27, 2023 · 4 comments · Fixed by #104082
Closed

kvserver: iterator panic during split #104007

erikgrinaker opened this issue May 27, 2023 · 4 comments · Fixed by #104082
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented May 27, 2023

Seen when running cockroach workload init kv --splits 10000 on a new roachprod cluster. Seems like fallout from #103690.

E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395  a panic has occurred!
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +panic: ‹iterator with constraint=2 is being used with key /Min that has constraint=1›
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +(1) attached stack trace
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  -- stack trace:
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | runtime.gopanic
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	GOROOT/src/runtime/panic.go:884
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/storage.(*intentInterleavingIter).checkConstraint
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/storage/intent_interleaving_iter.go:574
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/storage.(*intentInterleavingIter).SeekGE
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/storage/intent_interleaving_iter.go:480
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/storage.mvccMinSplitKey
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:5904
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/storage.MVCCFirstSplitKey
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:5955
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).adminSplitWithDescriptor
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:357
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*splitQueue).processAttempt
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/split_queue.go:321
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*splitQueue).process
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/split_queue.go:225
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1020
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/util/timeutil.RunWithTimeout
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:979
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processLoop.func2.1
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:890
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | runtime.goexit
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +  | 	GOROOT/src/runtime/asm_amd64.s:1594
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +Wraps: (2) panic: ‹iterator with constraint=2 is being used with key /Min that has constraint=1›
E230527 19:51:01.511739 [T1,n2,split,s2,r1/3:‹/{Min-System/NodeL…}›] 3395 +Error types: (1) *withstack.withStack (2) *errutil.leafError

Jira issue: CRDB-28305

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels May 27, 2023
@blathers-crl
Copy link

blathers-crl bot commented May 27, 2023

Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@erikgrinaker erikgrinaker added the branch-master Failures and bugs on the master branch. label May 27, 2023
@irfansharif
Copy link
Contributor

Saw this in in a manual kv/restart/nodes=12 run.

@kvoli
Copy link
Collaborator

kvoli commented May 30, 2023

We aren't sanitizing local keys, like what we do for the other split path:

cockroach/pkg/storage/mvcc.go

Lines 5834 to 5836 in 2a7427c

if key.Less(roachpb.RKey(keys.LocalMax)) {
key = roachpb.RKey(keys.LocalMax)
}

This is unrelated*.

The problem occurs because the iterator is setup with the range's endKey, which is global, whilst we are passing the range's start key again to seekGE in mvccMinSplitKey. I'm surprised one is global and one is local?

The start key \Min shouldn't be considered local as far as I'm aware.

@kvoli
Copy link
Collaborator

kvoli commented May 30, 2023

Did repro with:

startKey: keys.MinKey,
endKey:   keys.Meta1KeyMax,

The range shouldn't be split regardless, since we don't split meta1 and it would fail a later check:

if !storage.IsValidSplitKey(foundSplitKey) {
return reply, errors.Errorf("cannot split range at key %s", splitKey)
}

The split key is irrelevant. MVCCFindSplitKey doesn't have this issue as it does the local key check on the start key, which MVCCFirstSplitKey does not.

cockroach/pkg/storage/mvcc.go

Lines 5834 to 5836 in 2a7427c

if key.Less(roachpb.RKey(keys.LocalMax)) {
key = roachpb.RKey(keys.LocalMax)
}

\Min "{nothing here}" is considered before LocalMax which is \0x02. Unsure if this is intended but likely is to add an additional layer to prevent splits of meta1.

craig bot pushed a commit that referenced this issue May 30, 2023
104082: storage: prevent iter panic on meta1 split key  r=itsbilal a=kvoli

It was possible that a load based split was suggested for `meta1`, which
would call `MVCCFirstSplitKey` and panic as the `meta1` start key
`\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`.

Add a check that the start key is greater than the `meta1` end key before
processing in `MVCCFirstSplitKey` to prevent the panic.

Note `meta1` would never be split regardless, as
`storage.IsValidSplitKey` would fail after finding a split key.

Also note that if the desired split key is a local key, the same problem
doesn't exist as the minimum split key would be used to seek the first
split key instead.

Fixes: #104007

Release note: None

Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in 30caa00 May 30, 2023
kvoli added a commit that referenced this issue Jun 5, 2023
It was possible that a load based split was suggested for `meta1`, which
would call `MVCCFirstSplitKey` and panic as the `meta1` start key
`\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`.

Add a check that the start key is greater than the `meta1` end key before
processing in `MVCCFirstSplitKey` to prevent the panic.

Note `meta1` would never be split regardless, as
`storage.IsValidSplitKey` would fail after finding a split key.

Also note that if the desired split key is a local key, the same problem
doesn't exist as the minimum split key would be used to seek the first
split key instead.

Fixes: #104007

Release note: None
kvoli added a commit that referenced this issue Jun 5, 2023
It was possible that a load based split was suggested for `meta1`, which
would call `MVCCFirstSplitKey` and panic as the `meta1` start key
`\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`.

Add a check that the start key is greater than the `meta1` end key before
processing in `MVCCFirstSplitKey` to prevent the panic.

Note `meta1` would never be split regardless, as
`storage.IsValidSplitKey` would fail after finding a split key.

Also note that if the desired split key is a local key, the same problem
doesn't exist as the minimum split key would be used to seek the first
split key instead.

Fixes: #104007

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Jun 7, 2023
It was possible that a load based split was suggested for `meta1`, which
would call `MVCCFirstSplitKey` and panic as the `meta1` start key
`\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`.

Add a check that the start key is greater than the `meta1` end key before
processing in `MVCCFirstSplitKey` to prevent the panic.

Note `meta1` would never be split regardless, as
`storage.IsValidSplitKey` would fail after finding a split key.

Also note that if the desired split key is a local key, the same problem
doesn't exist as the minimum split key would be used to seek the first
split key instead.

Fixes: cockroachdb#104007

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants