-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed [writing at X below closed ts] #62655
Comments
|
Couldn't repro yet in 10 runs. |
I've done around 50 runs, without a repro. Leaving another 100 over night. |
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. Release note: None
This patch splits the ReplicatedEvalResult.WriteTimestamp field into two: WriteTimestamp and ClockSignal. This clarifies what the deal with this timestamp is. Before this patch, WriteTimestamp was always coming from ba.WriteTimestamp(), which is either a transaction's write timestamp or, for non-txn requests, the batch's read timestamp or, for non-MVCC requests, some random clock value. Below Raft, the field is used for updating the followers' clocks, and also to check the request against the GC threshold. This patch makes the WriteTimestamp field only apply to `isIntentWrite` requests. These are the only requests for which the GC threshold check makes sense (if the check makes sense at all). In order to not increase the size of Raft commands, and also to not read the proposer's clock too much, only one of the two fields is ever set - i.e. ClockSignal is set when WriteTimestamp isn't. For backwards compatibility, WriteTimestamp is set just like before for proposals on ranges that haven't "migrated" to 21.1 - which migration is checked by looking at the new ClosedTimestamp field. This fixes cockroachdb#62569, which was complaining that lease transfers don't properly chain clock updates, allowing the transferee to have a lower timestamp than the lease start time. With this patch, the transfer will have a ClockSignal above the lease start time. With WriteTimestamp now only carried by regular writes, the patch expands an assertion below Raft about not writing under the closed timestamp. Before this patch, only the leaseholder was able to do this check because only the leaseholder had the original request at its disposal to see if indeed the assertion is valid (i.e. the assertion was checking isIntentWrite to avoid triggering for e.g. EndTxn which is allowed to operate below the closed ts). Now, relying on the fact the ReplicatedEvalResult.WriteTimestamp is only set for the types of requests that are not supposed to write below the closed timestamp, every follower can do the check, making it deterministic. Fixes cockroachdb#62569 Touches cockroachdb#62655, improving the assertion that triggered there. Release note: None
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
tpccbench also hit another related assertion - a regression in the Raft closed timestamp. That one is even more unexpected cause there's no "request write timestamp" moving part involved. |
In the original failure here, only one node died (because of an attempt to write below the closed ts). So the deterministic assertion about closed timestamps regression (the one from the link just above) did not fire. So the two failures seem different. |
The closed timestamp regression in #61981 (comment) is a real puzzle, more so then the writing below the closed timestamp on. The code involved in the former is not that complicated, and it looks right to me. So, within a lease, monotonicity of And yet something's got to give, since the assertion fired. |
The closed timestamp regression was also encountered in one sentry report. |
This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in cockroachdb#62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed.
This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in cockroachdb#62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed.
This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in cockroachdb#62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed.
This patch adds a new proposal flag- IntentWrite[*]. This flag corresponds to ba.isIntentWrite and identifies proposals that write to the regular key space. This new flag is used in conjunction with the revamped WriteTimestamp - see below. [*] The patch actually introduces the inverse flag - NonMVCC. This is so that all proposals coming from 20.2 nodes appear as IntentWrites, and deterministic below-Raft behaivor is preserved in mixed-version cluster. This patch also reworks the ReplicatedEvalResult.WriteTimestamp field (**). Before this patch, WriteTimestamp was always coming from ba.WriteTimestamp(), which is either a transaction's write timestamp or, for non-txn requests, the batch's read timestamp or, for non-MVCC requests, some random clock value. Below Raft, the field is used for updating the followers' clocks, and also to check the request against the GC threshold. This patch sets the WriteTimestamp differently for IntentWrite requests than other requests: - for regular writes, the field remains ba.WriteTimestamp() - for other proposals, the field is a clock reading on the proposer [**] An alternative to split the field into two was considered, but it's hard to do now because of backwards compatibility. It can be done in the next release, though, because now all the uses of the WriteTimestamp field tolerate it being empty. Some requests (e.g. LeaseTransfers) need a clock signal to travel with their proposal, and now they get it (see cockroachdb#62569). The GC threshold check now only applies to IntentWrite requests - they're the only ones for which that check ever made sense (if the check makes sense at all, which I don't think it does). For backwards compatibility, WriteTimestamp is set just like before for proposals on ranges that haven't "migrated" to 21.1 - which migration is checked by looking at the new ClosedTimestamp field. This fixes cockroachdb#62569, which was complaining that lease transfers don't properly chain clock updates, allowing the transferee to have a lower timestamp than the lease start time. With this patch, the transfer will have a WriteTimestamp above the lease start time. With the help of the new IntentWrite flag, the patch expands an assertion below Raft about not writing under the closed timestamp. Before this patch, only the leaseholder was able to do this check because only the leaseholder had the original request at its disposal to see if indeed the assertion is valid (i.e. the assertion was checking isIntentWrite to avoid triggering for e.g. EndTxn which is allowed to operate below the closed ts). Now every follower can do the check, making it deterministic. The expanded assertion is "pretty compatible" with previous 21.1 betas: - For proposals by new beta: the deterministic assertion might fire on the new beta (in case of bugs). The assertion claims that it's deterministic, but it's not since old betas don't have it (the non-deterministic assertion they do have won't fire because the leaseholder is new beta). So that's not great, but seems very unlikely that anyone will hit it. - Proposals by prev beta don't have the IntentWrite flag set, so no assertion on new-beta followers. The (old-beta) proposer/leaseholder assertion might fire (in case of bugs), but that assertion was non-deterministic already. Fixes cockroachdb#62569 Touches cockroachdb#62655, improving the assertion that triggered there. Release note: None
This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in cockroachdb#62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed. debug TC stress wait for voter
The "writing below closed timestamp" bug is hopefully being fixed by #63672. The "closed timestamp regressing" assertion we don't have an explanation for (some analysis in this comment above). One area of the code I don't have much confidence in (because I don't understand it very well) is around reordering protections for different lease acquisitions, when multiple competing acquisition requests are based on the same base lease. Around here. I wonder if some sort of inversion between lease commands is possible, thus resulting in a closed timestamp regression. Although I don't see it yet. |
This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in cockroachdb#62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed.
63589: server, security: Fix one-way connectivity with connect cmd r=knz a=itsbilal Informs #60632. Previously, non-trust-leader nodes couldn't connect back to the trust leader due to the presence of the wrong `ca-client.crt` on their disk; the main CA cert/key was being written in four places. This change fixes that bug, and also creates a new `client.node.crt` certificate to prevent other subsequent errors from being thrown. Fixes #61624. Release note: None. 63672: kvserver: fix write below closedts bug r=andreimatei a=andreimatei This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in #62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed. 63756: backupccl: reset restored jobs during cluster restore r=dt a=pbardea Previously, jobs were restored without modification during cluster restore. Due to a recently discovered bug where backup may miss non-transactional writes written to offline spans by these jobs, their progress may no longer be accurate on the restored cluster. IMPORT and RESTORE jobs perform non-transactional writes that may be missed. When a cluster RESTORE brings back these OFFLINE tables, it will also bring back its associated job. To ensure the underlying data in these tables is correct, the jobs are now set in a reverting state so that they can clean up after themselves. In-progress schema change jobs that are affected, will fail upon validation. Release note (bug fix): Fix a bug where restored jobs may have assumed to have made progress that was not captured in the backup. The restored jobs are now either canceled cluster restore. 63837: build: update the go version requirement for `make` r=otan a=knz Fixes #63837. The builder image already requires go 1.15.10. This patch modifies the check for a non-builder `make` command to require at least the same version. Release note: None Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Paul Bardea <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
This patch fixes a bug in our closed timestamp management. This bug was making it possible for a command to close a timestamp even though other requests writing at lower timestamps are currently evaluating. The problem was that we were assuming that, if a replica is proposing a new lease, there can't be any requests in flight and every future write evaluated on the range will wait for the new lease and the evaluate above the lease start time. Based on that reasoning, the proposal buffer was recording the lease start time as its assignedClosedTimestamp. This was matching what it does for every write, where assignedClosedTimestamp corresponds to the the closed timestamp carried by the command. It turns out that the replica's reasoning was wrong. It is, in fact, possible for writes to be evaluating on the range when the lease acquisition is proposed. And these evaluations might be done at timestamps below the would-be lease's start time. This happens when the replica has already received a lease through a lease transfer. The transfer must have applied after the previous lease expired and the replica decided to start acquiring a new one. This fixes one of the assertion failures seen in cockroachdb#62655. Release note (bug fix): A bug leading to crashes with the message "writing below closed ts" has been fixed.
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
Closing this one, and moving the GA-blocker label to #61981 (comment) |
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on release-21.1@f602e37e31a256980ae897917f45cba9c135b412:
More
Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: