Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvnemesis: rare resolved timestamp violation #60929

Closed
nvanbenschoten opened this issue Feb 22, 2021 · 4 comments · Fixed by #62570
Closed

kv/kvnemesis: rare resolved timestamp violation #60929

nvanbenschoten opened this issue Feb 22, 2021 · 4 comments · Fixed by #62570
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-kv-transactions Relating to MVCC and the transactional model. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-investigation Further steps needed to qualify. C-label will change. GA-blocker

Comments

@nvanbenschoten
Copy link
Member

After about 45 minutes of running kvnemesis, I often see an error like:

make roachprod-stress PKG=./pkg/kv/kvnemesis TESTS=TestKVNemesisMultiNode TESTTIMEOUT=2m CLUSTER=nathan-stress
        panic: resolved timestamp 1613982715.406343041,0 equal to or above timestamp of operation {<nil> txn_id:2a60432a-fa66-408c-9160-7ff86c54f2b9 txn_key:"\272\0220b2e8742\000\001" txn_min_timestamp:<wall_time:1613982715404791290 > timestamp:<wall_time:1613982715404791290 >  <nil> <nil> <nil> <nil>}

goroutine 2815 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000914380, 0x5118be0, 0xc004067e30)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126
panic(0x3bca140, 0xc0045ff4c0)
        /usr/local/go/src/runtime/panic.go:969 +0x1b9
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*resolvedTimestamp).assertOpAboveRTS(0xc004dfcfd8, 0x0, 0xc003334780, 0x0, 0x0, 0x0, 0x0, 0x166604ba7952adfa, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/resolved_timestamp.go:251 +0x165
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*resolvedTimestamp).consumeLogicalOp(0xc004dfcfd8, 0x0, 0xc003334780, 0x0, 0x0, 0x0, 0x0, 0xc0032b6060)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/resolved_timestamp.go:146 +0x296
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*resolvedTimestamp).ConsumeLogicalOp(0xc004dfcfd8, 0x0, 0xc003334780, 0x0, 0x0, 0x0, 0x0, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/resolved_timestamp.go:132 +0x4d
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*Processor).consumeLogicalOps(0xc004dfcf20, 0x5118b20, 0xc003ef8b00, 0xc0042d2990, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/processor.go:591 +0x227
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*Processor).consumeEvent(0xc004dfcf20, 0x5118b20, 0xc003ef8b00, 0xc003518230)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/processor.go:540 +0x294
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*Processor).run(0xc004dfcf20, 0x5118be0, 0xc004067e30, 0xc0042b3820, 0xc000914380)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/processor.go:286 +0xb7a
github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed.(*Processor).Start.func1(0x5118be0, 0xc004067e30)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed/processor.go:196 +0x50
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc000914380, 0x5118be0, 0xc004067e30, 0x0, 0xc0042b3840)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351 +0xb9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0xfc

At first, I figured that this was fallout from #59566, but I've managed to reproduce with that change disabled. This is an MVCCWriteIntentOp below a range's resolved timestamp, which should never be allowed.

@nvanbenschoten nvanbenschoten added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-investigation Further steps needed to qualify. C-label will change. A-kv-replication Relating to Raft, consensus, and coordination. A-kv-transactions Relating to MVCC and the transactional model. labels Feb 22, 2021
@nvanbenschoten
Copy link
Member Author

I think I see the problem here. We consider a range's lease start time to be closed (see explanation in 7037b54), but this can be violated by a write to the subsumed keyspace after a range merge causes a range to grow. In such cases, we bump the leaseholder's timestamp cache over the subsumed keyspace to the freeze time, but this may still lag behind the LHS range's lease start time. This allows a write to land on the subsumed side of the range below the lease start time. Unless I'm missing something, we'll either want to bump the subsumed keyspace to the lease start time during a merge or stop considering the lease start time as a contributor to the closed timestamp (which we may want to do anyway, see #61989).

There might be more to this story, as I'm also seeing evidence that a leaseholder has a clock below its lease start time. I wouldn't expect this to be possible, but I'm also now second-guessing that we properly prevent it. We do set the Timestamp on lease requests here, but the resulting lease can end up with a higher timestamp (as of c44b357):

// Forward the lease's start time to a current clock reading. At this
// point, we're holding latches across the entire range, we know that
// this time is greater than the timestamps at which any request was
// serviced by the leaseholder before it stopped serving requests (i.e.
// before the TransferLease request acquired latches).
newLease := args.Lease
newLease.Start.Forward(cArgs.EvalCtx.Clock().NowAsClockTimestamp())

It's unclear whether this is contributing to the issue, merely making it more likely, or unrelated.

cc. @andreimatei you might have thoughts here.

@andreimatei
Copy link
Contributor

but this can be violated by a write to the subsumed keyspace after a range merge causes a range to grow

Can you spell this out for me? Isn't a write to the subsumed keyspace after a merge bumped above the merged lease? And what does it mean for it to "cause a range to grow"?

@nvanbenschoten
Copy link
Member Author

Isn't a write to the subsumed keyspace after a merge bumped above the merged lease?

That's the problem - what enforces this? In the usual case, a new lease bumps the timestamp cache across a range's keyspace to the lease start time. But if the range then grows because it merges in its RHS neighbor, the timestamp cache for this newly added keyspace isn't guaranteed to be above the LHS's lease start time. So the LHS can serve writes below its lease start time, violating its reported closed timestamp.

@andreimatei
Copy link
Contributor

Yeah, this seems like a bug. Like you're saying, it doesn't seem like the lease start time should be particularly tied to the closed timestamp. Particularly in 21.1, where a lease change doesn't cause a regression in any replica's info about what closed timestamp applies.

@cockroachdb cockroachdb deleted a comment from knz Mar 18, 2021
@cockroachdb cockroachdb deleted a comment from knz Mar 18, 2021
@nvanbenschoten nvanbenschoten self-assigned this Mar 25, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 26, 2021
Fixes cockroachdb#60929.
Relates to cockroachdb#61986.
Relates to cockroachdb#61989.

This commit fixes a closed timestamp violation that could allow a
value/intent write at a timestamp below a range's closed timestamp. This
could allow for serializability violations if it allowed a follower read
to miss a write and could lead to a panic in the rangefeed processor if
a rangefeed was watching at the right time, as we saw in cockroachdb#60929.

In cockroachdb#60929, we found that this bug was caused by a range merge and a
lease transfer racing in such a way that the closed timestamp could
later be violated by a write to the subsumed portion of the joint range.
The root cause of this was an opportunistic optimization made in 7037b54
to consider a range's lease start time as an input to its closed
timestamp computation. This optimization did not account for the
possibility of serving writes to a newly subsumed keyspace below a
range's lease start time if that keyspace was merged into a range under
its current lease and with a freeze time below the current lease start
time. This bug is fixed by removing the optimization, which was on its
way out to allow for cockroachdb#61986 anyway.

Note that removing this optimization does not break
`TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2
closed timestamp system does not allow for closed timestamp regressions,
even across leaseholders. This was one of the many benefits of the new
system.
craig bot pushed a commit that referenced this issue Mar 30, 2021
62570: kv: don't consider lease start time as closed timestamp r=nvanbenschoten a=nvanbenschoten

Fixes #60929.
Relates to #61986.
Relates to #61989.

This commit fixes a closed timestamp violation that could allow a
value/intent write at a timestamp below a range's closed timestamp. This
could allow for serializability violations if it allowed a follower read
to miss a write and could lead to a panic in the rangefeed processor if
a rangefeed was watching at the right time, as we saw in #60929.

In #60929, we found that this bug was caused by a range merge and a
lease transfer racing in such a way that the closed timestamp could
later be violated by a write to the subsumed portion of the joint range.
The root cause of this was an opportunistic optimization made in 7037b54
to consider a range's lease start time as an input to its closed
timestamp computation. This optimization did not account for the
possibility of serving writes to a newly subsumed keyspace below a
range's lease start time if that keyspace was merged into a range under
its current lease and with a freeze time below the current lease start
time. This bug is fixed by removing the optimization, which was on its
way out to allow for #61986 anyway.

Note that removing this optimization does not break
`TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2
closed timestamp system does not allow for closed timestamp regressions,
even across leaseholders. This was one of the many benefits of the new
system.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in d77c3ed Mar 30, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 30, 2021
Fixes cockroachdb#60929.
Relates to cockroachdb#61986.
Relates to cockroachdb#61989.

This commit fixes a closed timestamp violation that could allow a
value/intent write at a timestamp below a range's closed timestamp. This
could allow for serializability violations if it allowed a follower read
to miss a write and could lead to a panic in the rangefeed processor if
a rangefeed was watching at the right time, as we saw in cockroachdb#60929.

In cockroachdb#60929, we found that this bug was caused by a range merge and a
lease transfer racing in such a way that the closed timestamp could
later be violated by a write to the subsumed portion of the joint range.
The root cause of this was an opportunistic optimization made in 7037b54
to consider a range's lease start time as an input to its closed
timestamp computation. This optimization did not account for the
possibility of serving writes to a newly subsumed keyspace below a
range's lease start time if that keyspace was merged into a range under
its current lease and with a freeze time below the current lease start
time. This bug is fixed by removing the optimization, which was on its
way out to allow for cockroachdb#61986 anyway.

Note that removing this optimization does not break
`TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2
closed timestamp system does not allow for closed timestamp regressions,
even across leaseholders. This was one of the many benefits of the new
system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-kv-transactions Relating to MVCC and the transactional model. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-investigation Further steps needed to qualify. C-label will change. GA-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants