Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangefeed: fix premature checkpoint due to intent resolution race #117612

Merged

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Jan 10, 2024

This PR depends on the following changes, which should be backported together:


It was possible for rangefeeds to emit a premature checkpoint, before all writes below its timestamp had been emitted. This in turn would cause changefeeds to not emit these write events at all.

This could happen because PushTxn may return a false ABORTED status for a transaction that has in fact been committed, if the transaction record has been GCed (after resolving all intents). The timestamp cache does not retain sufficient information to disambiguate a committed transaction from an aborted one in this case, so it pessimistically
assumes an abort (see Replica.CanCreateTxnRecord and batcheval.SynthesizeTxnFromMeta).

However, the rangefeed txn pusher trusted this ABORTED status, ignoring the pending txn intents and allowing the resolved timestamp to advance past them before emitting the committed intents. This can lead to the following scenario:

  • A rangefeed is running on a lagging follower.
  • A txn writes an intent, which is replicated to the follower.
  • The closed timestamp advances past the intent.
  • The txn commits and resolves the intent at the original write timestamp, then GCs its txn record. This is not yet applied on the follower.
  • The rangefeed pushes the txn to advance its resolved timestamp.
  • The txn is GCed, so the push returns ABORTED (it can't know whether the txn was committed or aborted after its record is GCed).
  • The rangefeed disregards the "aborted" txn and advances the resolved timestamp, emitting a checkpoint.
  • The follower applies the resolved intent and emits an event below the checkpoint, violating the checkpoint guarantee.
  • The changefeed sees an event below its frontier and drops it, never emitting these events at all.

This patch fixes the bug by submitting a barrier command to the leaseholder which waits for all past and ongoing writes (including intent resolution) to complete and apply, and then waits for the local replica to apply the barrier as well. This ensures any committed intent resolution will be applied and emitted before the transaction is removed from resolved timestamp tracking.

Resolves #104309.
Epic: none
Release note (bug fix): fixed a bug where a changefeed could omit events in rare cases, logging the error "cdc ux violation: detected timestamp ... that is less or equal to the local frontier". This can happen if a rangefeed runs on a follower replica that lags significantly behind the leaseholder, a transaction commits and removes its transaction record before its intent resolution is applied on the follower, the follower's closed timestamp has advanced past the transaction commit timestamp, and the rangefeed attempts to push the transaction to a new timestamp (at least 10 seconds after the transaction began). This may cause the rangefeed to prematurely emit a checkpoint before emitting writes at lower timestamps, which in turn may cause the changefeed to drop these events entirely, never emitting them.

@erikgrinaker erikgrinaker self-assigned this Jan 10, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch 3 times, most recently from 4fc2d95 to e7ac988 Compare January 11, 2024 14:01
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think I prefer this over #114487 for the reasons discussed in the PR. I'm glad Barrier could be extended for this use case.

What do you think about it?

Reviewed 2 of 2 files at r1, 7 of 7 files at r2, 3 of 3 files at r3, 6 of 6 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)


pkg/kv/kvpb/api.proto line 1323 at r4 (raw file):

but it did exist in the past

I don't even think we know that. The timestamp cache may have been ratcheted above the txn's min timestamp across the board.


pkg/kv/kvserver/replica_rangefeed.go line 189 at r3 (raw file):

	// Execute a Barrier on the leaseholder.
	//
	// TODO(erikgrinaker): handle Barrier requests split across ranges. This can

If a range splits, its rangefeed processor will be torn down and recreated, at which point the resolved timestamp will be recomputed with a lock table scan. So it doesn't feel like we necessarily need to handle this by using the LAIs returned from the Barrier request. But we also shouldn't crash. What happens now?


pkg/kv/kvserver/replica_rangefeed.go line 193 at r3 (raw file):

	// isUnsplittable on BarrierRequest, but we can maybe do something better
	// (e.g. return per-range LAIs).
	desc := tp.r.Desc()

Should we be passing the key span in from the rangefeed processor? We will tear the processor down on splits or merged, but I'm not sure that precludes any races.


pkg/kv/kvserver/replica_rangefeed.go line 209 at r3 (raw file):

	for tp.r.GetLeaseAppliedIndex() < lai {
		if ticker == nil {
			ticker = time.NewTicker(100 * time.Millisecond)

Do you think it's worth adding an exponential backoff here? 100ms for the first iteration feels a bit long.


pkg/kv/kvserver/replica_write.go line 299 at r2 (raw file):

			}
			// Populate LeaseAppliedIndex for Barrier responses.
			if ba.IsSingleBarrierRequest() && propResult.Err == nil {

Reminder to myself and others who get here: Barrier requests do go through Raft, despite having an empty WriteBatch, because of BatchRequest.RequiresConsensus.


pkg/kv/kvserver/intentresolver/intent_resolver.go line 387 at r4 (raw file):

	pushType kvpb.PushTxnType,
	skipIfInFlight bool,
) (_ map[uuid.UUID]*roachpb.Transaction, ambiguousAbort bool, _ *kvpb.Error) {

nit: consider renaming this to anyAmbiguousAbort to something along those lines.


pkg/kv/kvserver/rangefeed/resolved_timestamp.go line 163 at r5 (raw file):

		//
		// TODO(erikgrinaker): check that this won't end up with crash loops.
		rts.assertOpAboveRTS(op, t.Timestamp)

I can't recall why we didn't already have this assertion here. I don't think it was an intentional omission when we added the other assertions.


pkg/kv/kvserver/rangefeed/task.go line 346 at r3 (raw file):

	// intents in the rangefeed processor's queue. These updates may not yet have
	// been applied to the resolved timestamp intent queue, but that's ok -- our
	// MVCCAbortTxnOp will be enqueued and processed after them.

Could you mention that it's safe to publish a committed MVCCAbortTxnOp for a txn once all of that txn's local intents have been committed and all MVCCCommitIntentOp ops have been published? And then add a test for this?


pkg/kv/kvserver/rangefeed/task.go line 352 at r3 (raw file):

	//
	// TODO(erikgrinaker): check how often we hit this, and consider coming up
	// with a way to avoid it in the common case, since it incurs a write.

s/write/raft consensus round/


pkg/kv/kvserver/rangefeed/task.go line 253 at r4 (raw file):

	// prevent a restart for any transaction that has not been written
	// over at a larger timestamp.
	pushedTxns, ambiguousAbort, err := a.pusher.PushTxns(ctx, a.txns, a.ts)

Can we trust this ambiguousAbort value in mixed-version clusters?


pkg/kv/kvserver/batcheval/cmd_barrier_test.go line 38 at r2 (raw file):

	// TODO(erikgrinaker): add some actual testing here.
	ts, lai, err := kvDB.Barrier(ctx, start, end)

Should this test be in batcheval? It's spinning up a full server, which we usually reserve for pkg/kv/kvclient or pkg/kv/kvserver.

Do we have any other testing of Barrier that we could extend?

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about it?

I think it's better -- it's simpler and more robust. I'm not overly worried about the cost of these writes either, given the conditions under which it triggers and that it's at most one per range per incident, but I'll have a closer look at this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_rangefeed.go line 189 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If a range splits, its rangefeed processor will be torn down and recreated, at which point the resolved timestamp will be recomputed with a lock table scan. So it doesn't feel like we necessarily need to handle this by using the LAIs returned from the Barrier request. But we also shouldn't crash. What happens now?

As is, we'll use the LAI of the first range overlapping the keyspan, because there's no handling of this in BarrierResponse.combine():

https://github.com/cockroachdb/cockroach/blob/02678b63662b6fec01a7ad27c686011ebe08ae13/pkg/kv/kvpb/api.go#L659-L668

For splits, that could actually be fine, because that LAI would be on the LHS after applying the split, so waiting for it would also mean waiting for the split to apply locally and tearing down the rangefeed. But this is mostly accidental, and we shouldn't rely on it. It would also break on a merge where the LHS LAI is lower than then RHS LAI -- that case requires us to check that the range ID matches our local replica, which I don't believe is currently exposed in the RPC response.

For our purposes here, it would be sufficient to mark Barrier as isUnsplittable and include the range ID in the response, and then just error out in txnPushAttempt until we tear down the rangefeed processor. But this doesn't generalize particularly well, because other callers would likely want to apply a barrier across a keyspan regardless of the range structure.

We could also return a slice of RangeBarrier containing the range ID, key span, evaluation timestamp, and LAI. However, that seems impractical when used with a large table across thousands of ranges, where we'd have to start using resume spans to avoid large responses.

I'm going to do the straightforward thing for now: make this conditional on a parameter BarrierRequest.WithLeaseAppliedIndex which will make the request unsplittable.


pkg/kv/kvserver/replica_rangefeed.go line 193 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we be passing the key span in from the rangefeed processor? We will tear the processor down on splits or merged, but I'm not sure that precludes any races.

Yeah, that seems prudent.


pkg/kv/kvserver/replica_rangefeed.go line 209 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you think it's worth adding an exponential backoff here? 100ms for the first iteration feels a bit long.

Yeah, either that or we add a way to subscribe to LAI updates from the replica -- but that seems a bit overkill perhaps.


pkg/kv/kvserver/rangefeed/resolved_timestamp.go line 163 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I can't recall why we didn't already have this assertion here. I don't think it was an intentional omission when we added the other assertions.

Is this assertion relevant for MVCCUpdateIntentOp as well?


pkg/kv/kvserver/rangefeed/task.go line 346 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you mention that it's safe to publish a committed MVCCAbortTxnOp for a txn once all of that txn's local intents have been committed and all MVCCCommitIntentOp ops have been published? And then add a test for this?

Yep, I'll verify and document this, with a test.


pkg/kv/kvserver/rangefeed/task.go line 253 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we trust this ambiguousAbort value in mixed-version clusters?

We can't, but in that case it degrades to the current behavior. That's worth documenting on the API.


pkg/kv/kvserver/batcheval/cmd_barrier_test.go line 38 at r2 (raw file):

Should this test be in batcheval? It's spinning up a full server, which we usually reserve for pkg/kv/kvclient or pkg/kv/kvserver.

Several other batcheval tests do as well. I don't have an opinion on whether they should (beyond unit tests being better when possible), but it seems natural to test the request here, and we'll need to submit the proposal through Raft to fully do so.

Do we have any other testing of Barrier that we could extend?

Nope. I'll see if I can add some basic coverage in kvnemesis too.

@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch from e7ac988 to fbdd612 Compare January 15, 2024 16:58
@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch 2 times, most recently from c258859 to 94ce567 Compare January 16, 2024 15:35
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)


pkg/kv/kvserver/replica_rangefeed.go line 189 at r3 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

As is, we'll use the LAI of the first range overlapping the keyspan, because there's no handling of this in BarrierResponse.combine():

https://github.com/cockroachdb/cockroach/blob/02678b63662b6fec01a7ad27c686011ebe08ae13/pkg/kv/kvpb/api.go#L659-L668

For splits, that could actually be fine, because that LAI would be on the LHS after applying the split, so waiting for it would also mean waiting for the split to apply locally and tearing down the rangefeed. But this is mostly accidental, and we shouldn't rely on it. It would also break on a merge where the LHS LAI is lower than then RHS LAI -- that case requires us to check that the range ID matches our local replica, which I don't believe is currently exposed in the RPC response.

For our purposes here, it would be sufficient to mark Barrier as isUnsplittable and include the range ID in the response, and then just error out in txnPushAttempt until we tear down the rangefeed processor. But this doesn't generalize particularly well, because other callers would likely want to apply a barrier across a keyspan regardless of the range structure.

We could also return a slice of RangeBarrier containing the range ID, key span, evaluation timestamp, and LAI. However, that seems impractical when used with a large table across thousands of ranges, where we'd have to start using resume spans to avoid large responses.

I'm going to do the straightforward thing for now: make this conditional on a parameter BarrierRequest.WithLeaseAppliedIndex which will make the request unsplittable.

I've updated the PR with this approach. It should be mostly good to go once I add some tests, would appreciate another look in the meanwhile.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 18 files at r6, 6 of 12 files at r7, 4 of 7 files at r8, 6 of 6 files at r9, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)


pkg/kv/kvserver/replica_application_result.go line 268 at r6 (raw file):

	// Populate BarrierResponse if requested.
	if pErr == nil && cmd.proposal.Local != nil && cmd.proposal.Local.PopulateBarrierResponse {
		cmd.proposal.Local.PopulateBarrierResponse = false // mark as handled

nit: want to push this into a DetachPopulateBarrierResponse method on LocalResult, which handles the cmd.proposal.Local == nil case?


pkg/kv/kvserver/replica_rangefeed.go line 200 at r9 (raw file):

	}
	if lai == 0 {
		if tp.r.store.ClusterSettings().Version.IsActive(ctx, clusterversion.V24_1Start) {

Should we check the active cluster version before issuing the Barrier request? As is, the cluster might have completed its upgrade after the Barrier request was sent.


pkg/kv/kvserver/batcheval/result/result.go line 102 at r6 (raw file):

		"GossipFirstRange:%t MaybeGossipSystemConfig:%t "+
		"MaybeGossipSystemConfigIfHaveFailure:%t MaybeAddToSplitQueue:%t "+
		"MaybeGossipNodeLiveness:%s PopulateBarrierResponse:%t",

nit: let's keep this in the same order as the struct definition and IsZero.


pkg/kv/kvserver/rangefeed/resolved_timestamp.go line 163 at r5 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Is this assertion relevant for MVCCUpdateIntentOp as well?

I don't think so. In theory, an intent that the rangefeed knows will eventually be aborted could get updated from a timestamp below the resolved ts to another timestamp below the resolved ts.


pkg/kv/kvserver/rangefeed/resolved_timestamp.go line 165 at r9 (raw file):

		// This assertion can be violated in mixed-version clusters, so gate it
		// on 24.1. See: https://github.com/cockroachdb/cockroach/issues/104309
		if rts.settings.Version.IsActive(context.Background(), clusterversion.V24_1Start) {

To be extra careful, do we want to add an env var here so that we don't even up crash looping a cluster without any mechanism to get unstuck?


pkg/kv/kvserver/rangefeed/task.go line 253 at r4 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

We can't, but in that case it degrades to the current behavior. That's worth documenting on the API.

I see. So we may operate on incorrect information, but that will cause this txnPushAttempt to degrade to the current behavior. That seems fine, but +1 on documenting — other uses of the flag may not handle being lied to as gracefully.


pkg/kv/kvserver/rangefeed/task.go line 363 at r9 (raw file):

		// rangefeed processor shuts down. It needs to be, otherwise a stalled
		// barrier can prevent node shutdown. Regardless, use a 1 minute backstop
		// to prevent getting wedged.

Were you planning to address this TODO in this initial PR?


pkg/kv/kvserver/batcheval/cmd_barrier_test.go line 38 at r2 (raw file):

I'll see if I can add some basic coverage in kvnemesis too.

👍 That's a good idea.

@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch from 94ce567 to d105a98 Compare January 16, 2024 19:11
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_rangefeed.go line 200 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we check the active cluster version before issuing the Barrier request? As is, the cluster might have completed its upgrade after the Barrier request was sent.

This seems exceedingly unlikely, since the remote server would have to have its binary upgraded and restarted while the response is in flight (a 24.1 binary will return the right response regardless of the cluster version). But I suppose it can't hurt. Done.


pkg/kv/kvserver/rangefeed/resolved_timestamp.go line 165 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To be extra careful, do we want to add an env var here so that we don't even up crash looping a cluster without any mechanism to get unstuck?

Yeah, may as well -- done. We could also consider disabling the assertion in release builds instead.


pkg/kv/kvserver/rangefeed/task.go line 253 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I see. So we may operate on incorrect information, but that will cause this txnPushAttempt to degrade to the current behavior. That seems fine, but +1 on documenting — other uses of the flag may not handle being lied to as gracefully.

Added some more NBs on the IntentResolver and TxnPusher APIs.


pkg/kv/kvserver/rangefeed/task.go line 363 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Were you planning to address this TODO in this initial PR?

I'll submit a separate PR for this tomorrow, and clean up some of the related rangefeed shutdown code.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 15 files at r10, 6 of 7 files at r11, 5 of 5 files at r12, 3 of 3 files at r13, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)

craig bot pushed a commit that referenced this pull request Jan 16, 2024
117801: kvserver: acquire `Replica.mu` when returning reproposal error r=erikgrinaker a=erikgrinaker

Discovered while working on #117612. Looks like it's been there since #42939.

This is a rare error, and these fields are unlikely to change while we're holding the raft mutex, so seems very unlikely to have caused any problems -- I could be convinced we shouldn't backport this, on the off-chance I've missed a potential deadlock.

Epic: none
Release note: None

117833: *: un-skip many tests that were skipped over the last month r=rail a=rickystewart

This partially reverts the following commits:

```
e615aec
99884f8
99f9760
75c91c2
5c64a22
c1778b7
381c96b
f9b20ae
8608499
eb71a83
337777f
```

These tests were skipped due to failures under RBE, typically under `deadlock` and `race`. With the addition of the `heavy` pool for these tests, we anticipate the decreased load will cause these tests to stop failing, therefore we un-skip the skipped tests.

Epic: CRDB-8308
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch from d105a98 to 92009d7 Compare January 17, 2024 09:59
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/rangefeed/task.go line 363 at r9 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I'll submit a separate PR for this tomorrow, and clean up some of the related rangefeed shutdown code.

#117859

@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch 2 times, most recently from f37ce33 to 3f4d373 Compare January 19, 2024 10:30
Copy link

blathers-crl bot commented Jan 19, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r20, 1 of 5 files at r22, 3 of 36 files at r24, 27 of 30 files at r25, 3 of 6 files at r26, 3 of 3 files at r28, 3 of 3 files at r29, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)


pkg/kv/kvserver/rangefeed/task.go line 38 at r27 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I wonder if we should make this a cluster setting instead. While I don't expect this to cause problems, if it does, it's a lot less hassle for operators. Then again, adding cluster settings for every little thing (even if temporary) is going to snowball pretty fast. Wdyt?

A cluster setting seems reasonable here. There may be other reasons why we'd want the configurability of being able to disable this dynamically.

@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch from 58a1834 to 6c8d455 Compare January 23, 2024 20:39
It was possible for rangefeeds to emit a premature checkpoint, before
all writes below its timestamp had been emitted. This in turn would
cause changefeeds to not emit these write events at all.

This could happen because `PushTxn` may return a false `ABORTED` status
for a transaction that has in fact been committed, if the transaction
record has been GCed (after resolving all intents). The timestamp cache
does not retain sufficient information to disambiguate a committed
transaction from an aborted one in this case, so it pessimistically
assumes an abort (see `Replica.CanCreateTxnRecord` and
`batcheval.SynthesizeTxnFromMeta`).

However, the rangefeed txn pusher trusted this `ABORTED` status,
ignoring the pending txn intents and allowing the resolved timestamp to
advance past them before emitting the committed intents. This can lead to
the following scenario:

- A rangefeed is running on a lagging follower.
- A txn writes an intent, which is replicated to the follower.
- The closed timestamp advances past the intent.
- The txn commits and resolves the intent at the original write timestamp,
  then GCs its txn record. This is not yet applied on the follower.
- The rangefeed pushes the txn to advance its resolved timestamp.
- The txn is GCed, so the push returns ABORTED (it can't know whether the
  txn was committed or aborted after its record is GCed).
- The rangefeed disregards the "aborted" txn and advances the resolved
  timestamp, emitting a checkpoint.
- The follower applies the resolved intent and emits an event below
  the checkpoint, violating the checkpoint guarantee.
- The changefeed sees an event below its frontier and drops it, never
  emitting these events at all.

This patch fixes the bug by submitting a barrier command to the
leaseholder which waits for all past and ongoing writes (including
intent resolution) to complete and apply, and then waits for the local
replica to apply the barrier as well. This ensures any committed intent
resolution will be applied and emitted before the transaction is removed
from resolved timestamp tracking.

Epic: none
Release note (bug fix): fixed a bug where a changefeed could omit events
in rare cases, logging the error "cdc ux violation: detected timestamp
... that is less or equal to the local frontier". This can happen if a
rangefeed runs on a follower replica that lags significantly behind the
leaseholder, a transaction commits and removes its transaction record
before its intent resolution is applied on the follower, the follower's
closed timestamp has advanced past the transaction commit timestamp, and
the rangefeed attempts to push the transaction to a new timestamp (at
least 10 seconds after the transaction began). This may cause the
rangefeed to prematurely emit a checkpoint before emitting writes at
lower timestamps, which in turn may cause the changefeed to drop these
events entirely, never emitting them.
@erikgrinaker erikgrinaker force-pushed the rangefeed-abort-resolvedts-barrier branch from 6c8d455 to 6d92cef Compare January 23, 2024 21:16
@erikgrinaker
Copy link
Contributor Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 23, 2024

Build succeeded:

@craig craig bot merged commit b6623c5 into cockroachdb:master Jan 23, 2024
9 of 10 checks passed
Copy link

blathers-crl bot commented Jan 23, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from aa5e353 to blathers/backport-release-23.1-117612: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from aa5e353 to blathers/backport-release-23.2-117612: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@pand5a
Copy link

pand5a commented Jan 25, 2024

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rangefeed: Pushtxn in rangefeed returned abort, but txn may have been committed
5 participants