-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rangefeed: don't delete txn records from the rngfeed pusher #53146
rangefeed: don't delete txn records from the rngfeed pusher #53146
Conversation
Only the last commit. |
beb3118
to
5c62f6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. That was a pretty clean change.
Reviewed 1 of 7 files at r5, 6 of 6 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/replica_rangefeed.go, line 107 at r6 (raw file):
ctx context.Context, intents []roachpb.LockUpdate, ) error { return tp.ir.ResolveIntents(ctx, intents, intentresolver.ResolveOptions{}).GoError()
Don't we need to poison? That seems pretty important.
pkg/kv/kvserver/intentresolver/intent_resolver.go, line 549 at r6 (raw file):
// actors to GC a txn record, since that can cause ambiguities for the // coordinator: if it had STAGED the txn, it won't be able to tell the // difference between a txn that had been implicitly committed and one that
"implicitly committed, recovered, and GC-ed"
"someone else aborted and GC-ed"
pkg/kv/kvserver/rangefeed/processor_test.go, line 717 at r6 (raw file):
tp.mockResolveIntentsFn(func(ctx context.Context, intents []roachpb.LockUpdate) error { // There's nothing to assert here. We expect the intents to correspond to // transactions that had theit LockSpans populated when we pushed them. This
their
pkg/kv/kvserver/rangefeed/task.go, line 206 at r6 (raw file):
}) // Asynchronously clean up the transaction's intents, which should
Asynchronously
pkg/kv/kvserver/rangefeed/task.go, line 231 at r6 (raw file):
// populated. If, however, we ran into a transaction that its coordinator // tried to rollback but didn't follow up with garbage collection, then // LockSpans will be populated.
Lease some indication that all of this is opportunistic and not necessary for correctness.
pkg/kv/kvserver/rangefeed/task_test.go, line 272 at r6 (raw file):
} func (tp *testTxnPusher) intentsToTxns(intents []roachpb.LockUpdate) []enginepb.TxnMeta {
Does this need to be a method? It doesn't look like it.
pkg/kv/kvserver/rangefeed/task_test.go, line 286 at r6 (raw file):
} func (tp *testTxnPusher) mockResolveIntentsFn(
nit: move this up so that the two mocking functions are next to each other.
5c62f6a
to
bc2f974
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_rangefeed.go, line 107 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Don't we need to poison? That seems pretty important.
good catch; we need to poison on aborted transactions.
Btw, I've prepended a commit commenting on the Poison flag on the proto specifying that it's ignored for intents from non-ABORTED transactions.
pkg/kv/kvserver/intentresolver/intent_resolver.go, line 549 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"implicitly committed, recovered, and GC-ed"
"someone else aborted and GC-ed"
done
pkg/kv/kvserver/rangefeed/processor_test.go, line 717 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
their
done
pkg/kv/kvserver/rangefeed/task.go, line 206 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Asynchronously
done
pkg/kv/kvserver/rangefeed/task.go, line 231 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Lease some indication that all of this is opportunistic and not necessary for correctness.
done
pkg/kv/kvserver/rangefeed/task_test.go, line 272 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does this need to be a method? It doesn't look like it.
it doesn't but I'd rather namespace it here
pkg/kv/kvserver/rangefeed/task_test.go, line 286 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: move this up so that the two mocking functions are next to each other.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r7, 6 of 6 files at r8.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/kvserver/intentresolver/intent_resolver.go, line 785 at r8 (raw file):
arent'
aren't
pkg/kv/kvserver/intentresolver/intent_resolver.go, line 786 at r8 (raw file):
// // This field is ignored for intents that arent' resolved for an ABORTED txn; // in other words only intents from ABORTED transactions ever poison the abort
Missing comma after "words"
pkg/roachpb/api.proto, line 1111 at r8 (raw file):
TransactionStatus status = 3; // Optionally poison the abort span for the transaction on all ranges // on which the intents reside. The field is ignored if status != ABORTED (i.e. only intents from
Looks like the wrapping got messed up here.
Specify that this field is ignored when resolving COMMITTED (or otherwise non-ABORTED) intents (which it is). At lease the rangefeedTxnPusher sets it for non-aborted transactions, leading to question about whether that's a good idea. Release note: None
Before this patch, the rangefeed txn pushed was GC'ing the records of the transactions that it found to be aborted or committed when it PUSH_TOUCHED's them. Deleting the txn record is not a nice thing to do if the txn's coordinator might still be running, and so this patch terminates the practice. If the coordinator is still running and it had STAGED the txn record, a PUSH_TOUCH might cause a transition to ABORTED or COMMITTED. If such a transition is followed by the GC of the txn record, this causes an ambigous commit error for the coordinator, which can't tell whether the transaction committed (implicitly) or was rolled back. As the code stands, this patch doesn't actually help eliminate the ambiguity in the ABORTED case (it does in the COMMIT case though), because the coordinator doesn't distinguish between an ABORTED record and a missing record; it treats both the same, as ambiguous. But we're going to improve that, and the coordinator is gonna consider an existing record as non-ambiguous. The GC'ed case is going to remain the ambiguous one, and the idea is to avoid that as much as possible by only doing GC from the gc-queue (after an hour) and from the coordinator. Touches cockroachdb#52566 Release note (bug fix): Some rare AmbiguousCommitErrors happening when CDC was used were eliminated.
bc2f974
to
8d162e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/intentresolver/intent_resolver.go, line 785 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
arent'
aren't
done
pkg/kv/kvserver/intentresolver/intent_resolver.go, line 786 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Missing comma after "words"
done
pkg/roachpb/api.proto, line 1111 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Looks like the wrapping got messed up here.
done
Build succeeded: |
Before this patch, the rangefeed txn pushed was GC'ing the records of
the transactions that it found to be aborted or committed when it
PUSH_TOUCHED's them. Deleting the txn record is not a nice thing to do
if the txn's coordinator might still be running, and so this patch
terminates the practice. If the coordinator is still running and it had
STAGED the txn record, a PUSH_TOUCH might cause a transition to ABORTED
or COMMITTED. If such a transition is followed by the GC of the txn
record, this causes an ambigous commit error for the coordinator, which
can't tell whether the transaction committed (implicitly) or was rolled
back.
As the code stands, this patch doesn't actually help eliminate the
ambiguity in the ABORTED case (it does in the COMMIT case though),
because the coordinator doesn't distinguish between an ABORTED record
and a missing record; it treats both the same, as ambiguous. But we're
going to improve that, and the coordinator is gonna consider an existing
record as non-ambiguous. The GC'ed case is going to remain the ambiguous
one, and the idea is to avoid that as much as possible by only doing GC
from the gc-queue (after an hour) and from the coordinator.
Touches #52566
Release note (bug fix): Some rare AmbiguousCommitErrors happening when
CDC was used were eliminated.