Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed [writing at X below closed ts] #62655

Closed
cockroach-teamcity opened this issue Mar 26, 2021 · 26 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on release-21.1@f602e37e31a256980ae897917f45cba9c135b412:

		  | main.runTPCCBench.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:894
		  | github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		  | github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:803
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:617
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 9: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2816384-1616738656-21-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		9: dead
		6: 32499
		1: 2715
		7: 31540
		2: 31519
		4: 31793
		5: 32519
		3: 31299
		8: 30508
		Error: UNCLASSIFIED_PROBLEM: 9: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 9: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-release-21.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 26, 2021
@nvanbenschoten
Copy link
Member

nvanbenschoten commented Mar 29, 2021

F210326 12:02:18.951852 195 kv/kvserver/store_raft.go:524 ⋮ [n9,s9,r9/8:‹/Table/1{3-4}›] 619 writing at 1616760138.160904109,1 below closed ts: 1616760138.508101756,0 (ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/0,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/2/1,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/3/1,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/4/1,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/6/1,/Min), [txn: 36f6bd89]): ‹non-deterministic failure›: writing at 1616760138.160904109,1 below closed ts: ‹1616760138.508101756,0› (‹ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/0,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/2/1,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/3/1,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02:14.844713Z/644512934977306627/4/1,/Min), ConditionalPut [/Table/13/1/2021-03-26T12:02
@andreimatei this isn't good. I wonder whether it's the same bug as #60929. We'll need to investigate this further.

@tbg tbg changed the title roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed [writing at X below closed ts] Mar 29, 2021
@andreimatei
Copy link
Contributor

Couldn't repro yet in 10 runs.
There's one thing that stands out from the error. The write that failed is an insert on system.rangelog. The timestamps encoded into the row (2021-03-26T12:02:14.844713Z) is the ReadTimestamp of the writing transaction. When this write is being applied, we see that the txn's WriteTimestampis 4s later:1616760138 -> 12:02:18`.

@andreimatei andreimatei added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 31, 2021
@andreimatei
Copy link
Contributor

I've done around 50 runs, without a repro. Leaving another 100 over night.
I've also improved roachtest such that I can run these tpccbench on a pool of existing clusters without reloading the data at the beginning of each test.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 1, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 2, 2021
This patch splits the ReplicatedEvalResult.WriteTimestamp field into
two: WriteTimestamp and ClockSignal. This clarifies what the deal with
this timestamp is. Before this patch, WriteTimestamp was always coming
from ba.WriteTimestamp(), which is either a transaction's write
timestamp or, for non-txn requests, the batch's read timestamp or, for
non-MVCC requests, some random clock value. Below Raft, the field is
used for updating the followers' clocks, and also to check the request
against the GC threshold. This patch makes the WriteTimestamp field only
apply to `isIntentWrite` requests. These are the only requests for which
the GC threshold check makes sense (if the check makes sense at all).
In order to not increase the size of Raft commands, and also to not read
the proposer's clock too much, only one of the two fields is ever set -
i.e. ClockSignal is set when WriteTimestamp isn't. For backwards
compatibility, WriteTimestamp is set just like before for proposals on
ranges that haven't "migrated" to 21.1 - which migration is checked by
looking at the new ClosedTimestamp field.

This fixes cockroachdb#62569, which was complaining that lease transfers don't
properly chain clock updates, allowing the transferee to have a lower
timestamp than the lease start time. With this patch, the transfer will
have a ClockSignal above the lease start time.

With WriteTimestamp now only carried by regular writes, the patch
expands an assertion below Raft about not writing under the closed
timestamp. Before this patch, only the leaseholder was able to do this
check because only the leaseholder had the original request at its
disposal to see if indeed the assertion is valid (i.e. the assertion was
checking isIntentWrite to avoid triggering for e.g. EndTxn which is
allowed to operate below the closed ts). Now, relying on the fact the
ReplicatedEvalResult.WriteTimestamp is only set for the types of
requests that are not supposed to write below the closed timestamp,
every follower can do the check, making it deterministic.

Fixes cockroachdb#62569

Touches cockroachdb#62655, improving the assertion that triggered there.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 5, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
@andreimatei
Copy link
Contributor

tpccbench also hit another related assertion - a regression in the Raft closed timestamp. That one is even more unexpected cause there's no "request write timestamp" moving part involved.
#61981 (comment)

@andreimatei
Copy link
Contributor

In the original failure here, only one node died (because of an attempt to write below the closed ts). So the deterministic assertion about closed timestamps regression (the one from the link just above) did not fire. So the two failures seem different.

@andreimatei
Copy link
Contributor

andreimatei commented Apr 9, 2021

The closed timestamp regression in #61981 (comment) is a real puzzle, more so then the writing below the closed timestamp on. The code involved in the former is not that complicated, and it looks right to me.
The assertion in question is here. It fires when a command is being staged on the replicaAppBatch, and it carries a ClosedTimestamp that's lower than what that replicaAppBatch already had accumulated. Each replicaAppBatch is initialized with the replica's state here. So, the assertion firing says that the command either has a lower ClosedTimestamp than the previous command in the batch or, if it's the first command in the batch, a lower ClosedTimestamp than what the replica state has gotten from the previous batch.
On the proposer side, we're supposed to be sending proposals with monotonic ClosedTimestamp (obvi). The assigning of ClosedTimestamp is done in the propBuf, clearly based on b.forwardClosedTimestampLocked(closedTSTarget) which keeps the propBuf monotonic. The only discontinuity is around lease changes. When the lease changes hands non-cooperatively (i.e. not lease extensions, not lease transfers), the propBuf on the new leaseholder gets initialized to the new lease start time, and for followers it gets reset. The fact that it's reset on followers shouldn't matter; followers are not supposed to propose anything with a ClosedTimestamp, other than a lease acquisition. Lease acquisitions are special - they carry the lease start time as their ClosedTimestamp .

So, within a lease, monotonicity of ClosedTimestamps carried by commands in ensured by the ratcheting done by the propBuf on proposals. Across leases, it is ensured by the fact that a lease is not supposed to start below the end of the previous one, and by the fact that no proposals are done with timestamps above the expiration of the lease under which they're proposed.

And yet something's got to give, since the assertion fired.

@andreimatei
Copy link
Contributor

The closed timestamp regression was also encountered in one sentry report.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 15, 2021
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 16, 2021
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 16, 2021
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 16, 2021
This patch adds a new proposal flag- IntentWrite[*]. This flag corresponds
to ba.isIntentWrite and identifies proposals that write to the regular
key space. This new flag is used in conjunction with the revamped
WriteTimestamp - see below.
[*] The patch actually introduces the inverse flag - NonMVCC. This is so
that all proposals coming from 20.2 nodes appear as IntentWrites, and
deterministic below-Raft behaivor is preserved in mixed-version cluster.

This patch also reworks the ReplicatedEvalResult.WriteTimestamp field
(**).  Before this patch, WriteTimestamp was always coming from
ba.WriteTimestamp(), which is either a transaction's write timestamp or,
for non-txn requests, the batch's read timestamp or, for non-MVCC
requests, some random clock value. Below Raft, the field is used for
updating the followers' clocks, and also to check the request against
the GC threshold. This patch sets the WriteTimestamp differently for
IntentWrite requests than other requests:
- for regular writes, the field remains ba.WriteTimestamp()
- for other proposals, the field is a clock reading on the proposer

[**] An alternative to split the field into two was considered, but it's
hard to do now because of backwards compatibility. It can be done in the
next release, though, because now all the uses of the WriteTimestamp
field tolerate it being empty.

Some requests (e.g. LeaseTransfers) need a clock signal to travel with
their proposal, and now they get it (see cockroachdb#62569).
The GC threshold check now only applies to IntentWrite requests -
they're the only ones for which that check ever made sense (if the check
makes sense at all, which I don't think it does).

For backwards compatibility, WriteTimestamp is set just like before for
proposals on ranges that haven't "migrated" to 21.1 - which migration is
checked by looking at the new ClosedTimestamp field.

This fixes cockroachdb#62569, which was complaining that lease transfers don't
properly chain clock updates, allowing the transferee to have a lower
timestamp than the lease start time. With this patch, the transfer will
have a WriteTimestamp above the lease start time.

With the help of the new IntentWrite flag, the patch expands an
assertion below Raft about not writing under the closed timestamp.
Before this patch, only the leaseholder was able to do this check
because only the leaseholder had the original request at its disposal to
see if indeed the assertion is valid (i.e. the assertion was checking
isIntentWrite to avoid triggering for e.g. EndTxn which is allowed to
operate below the closed ts). Now every follower can do the check,
making it deterministic.

The expanded assertion is "pretty compatible" with previous 21.1 betas:
- For proposals by new beta: the deterministic assertion might fire on
the new beta (in case of bugs). The assertion claims that it's
deterministic, but it's not since old betas don't have it (the
non-deterministic assertion they do have won't fire because the
leaseholder is new beta). So that's not great, but seems very unlikely
that anyone will hit it.

- Proposals by prev beta don't have the IntentWrite flag set, so no
  assertion on new-beta followers. The (old-beta) proposer/leaseholder
assertion might fire (in case of bugs), but that assertion was
non-deterministic already.

Fixes cockroachdb#62569

Touches cockroachdb#62655, improving the assertion that triggered there.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 17, 2021
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.

debug TC stress

wait for voter
@andreimatei
Copy link
Contributor

The "writing below closed timestamp" bug is hopefully being fixed by #63672.

The "closed timestamp regressing" assertion we don't have an explanation for (some analysis in this comment above). One area of the code I don't have much confidence in (because I don't understand it very well) is around reordering protections for different lease acquisitions, when multiple competing acquisition requests are based on the same base lease. Around here. I wonder if some sort of inversion between lease commands is possible, thus resulting in a closed timestamp regression. Although I don't see it yet.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 19, 2021
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.
craig bot pushed a commit that referenced this issue Apr 19, 2021
63589: server, security: Fix one-way connectivity with connect cmd r=knz a=itsbilal

Informs #60632.

Previously, non-trust-leader nodes couldn't connect back
to the trust leader due to the presence of the wrong
`ca-client.crt` on their disk; the main CA cert/key was
being written in four places.

This change fixes that bug,
and also creates a new `client.node.crt` certificate to prevent
other subsequent errors from being thrown.

Fixes #61624.

Release note: None.

63672: kvserver: fix write below closedts bug  r=andreimatei a=andreimatei

This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in #62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.

63756: backupccl: reset restored jobs during cluster restore r=dt a=pbardea

Previously, jobs were restored without modification during cluster
restore. Due to a recently discovered bug where backup may miss
non-transactional writes written to offline spans by these jobs, their
progress may no longer be accurate on the restored cluster.

IMPORT and RESTORE jobs perform non-transactional writes that may be
missed. When a cluster RESTORE brings back these OFFLINE tables, it will
also bring back its associated job. To ensure the underlying data in
these tables is correct, the jobs are now set in a reverting state so
that they can clean up after themselves.

In-progress schema change jobs that are affected, will fail upon
validation.

Release note (bug fix): Fix a bug where restored jobs may have assumed
to have made progress that was not captured in the backup. The restored
jobs are now either canceled cluster restore.

63837: build: update the go version requirement for `make` r=otan a=knz

Fixes #63837.

The builder image already requires go 1.15.10. This patch
modifies the check for a non-builder `make` command to require
at least the same version.

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 19, 2021
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 19, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
@tbg
Copy link
Member

tbg commented Apr 20, 2021

Closing this one, and moving the GA-blocker label to #61981 (comment)

@tbg tbg closed this as completed Apr 20, 2021
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 20, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 21, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 21, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 22, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 22, 2021
In cockroachdb#62655 we see that there appears to be something wrong with the Raft
closed timestamp. That issue shows an assertion failure about a command
trying to write below a timestamp that was promised to have been closed
by a previous command. This patch includes a little bit more info in
that assertion (the current lease) and adds another two assertions:
- an assertion that the closed timestamp itself does not regress. This
assertion already existed in stageTrivialReplicatedEvalResult(), but
that comes after the failure we've seen. The point of copying this
assertion up is to ascertain whether we're screwing up by regressing the
closed timestamp, or whether a particular request/command is at fault
for not obeying the closed timestamp.
- an assertion against closed ts regression when copying the replica
state from a staging batch to the replica.

All these assertions can be disabled with an env var if things go bad.

Fixes cockroachdb#62765

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants