storage: add "noop" intent resolution poisoning option #18635

tbg · 2017-09-20T20:45:52Z

NB: I'm flushing this out of a slew of related work, and think the migration story and the
changes to when we clear the abort cache deserve a particularly scrutinous review.

Manual testing in #15997 surfaced that one limiting
factor in resolving many intents is contention on the transaction's abort cache entry. In one
extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends
very large batches of intent resolution requests for the same transaction to the intent resolver.

These requests all overlapped on the transaction's abort cache key, causing very slow progress, and
ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this
appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in
#18199 it seemed well worth addressing, by means of

allow intent resolutions to not touch the abort span
correctly declare the keys for ResolveIntent{,Range} to only declare the abort cache key
if it is actually going to be accessed.

With these changes, the gc queue was able to clear out a million intents comfortably on my older
13" MacBook (single node).

Also use this option in the intent resolver, where possible -- most transactions don't receive abort
cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid
adding artificial contention there, though in many situations the same intent is resolved and so a
conflict still exists.

Migration: a new field number was added to the proto and the old one preserved. We continue to
populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the
deprecated field. I believe this is sufficient and we can just remove all traces of the old field in
v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).

cockroach-teamcity · 2017-09-20T20:46:05Z

This change is

bdarnell · 2017-09-21T18:33:37Z

Reviewed 3 of 8 files at r1.
Review status: 3 of 8 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/roachpb/api.proto, line 627 at r1 (raw file):

  // TODO(tschottdorf): this can be removed in CockroachDB v1.3.
  optional bool deprecated_poison = 4 [(gogoproto.nullable) = false];
  optional PoisonType poison = 5 [(gogoproto.nullable) = false];;

It's safe to change bools to enums in protobufs if the values line up (going from old to new, true and false are mapped to 1 and 0. going the other direction, zero is false and any non-zero is true). I'm not sure if that works out in this case or not.

pkg/storage/gc_queue.go, line 423 at r1 (raw file):

				infoMu.Unlock() // intentional
				defer infoMu.Lock()
				// Contention between transactions is not a problem in this path, so use Clear.

The poison mode doesn't matter for a committed intent.

pkg/storage/intent_resolver.go, line 109 at r1 (raw file):

	if err := ir.resolveIntents(ctx, resolveIntents,
		ResolveOptions{Wait: false, Poison: pushType == roachpb.PUSH_ABORT}); err != nil {

Now that I'm paging all of this in, this (the old code) looks incorrect to me. Suppose two transactions call processWriteIntentError for the same intents, one with PUSH_TIMESTAMP and one with PUSH_ABORT. The PUSH_ABORT's PushTxn completes first, then the PUSH_TIMESTAMP's PushTxn sees that the txn was already aborted. Both will then try to resolve the intent, but only one will try to poison. If the PUSH_TIMESTAMP resolve happens first, there will be a window during which the intent is missing but the abort cache has not been written.

It's not clear to me when it's ever safe to resolve an intent from an aborted transaction without setting the abort cache (Except after EndTransaction, when we can clear it), since it appears to depend on knowledge from the operation responsible for the abort.

I think even in the GC queue we have to set the abort cache for intents we resolve (and then we'll have to clean up those abort cache entries on a future GC queue iteration... ouch!). I might be wrong on this point, though.

pkg/storage/replica_command.go, line 1853 at r1 (raw file):

) (EvalResult, error) {
	args := cArgs.Args.(*roachpb.ResolveIntentRequest)
	args.Poison = fromDeprecatedPoison(args.Poison, args.DeprecatedPoison)

This mutates the input. I can't remember whether that's safe or not but it seems dirty; let's use a local variable instead.

Comments from Reviewable

tbg · 2017-09-25T20:54:52Z

Ready for another look.

Review status: 2 of 8 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

pkg/roachpb/api.proto, line 627 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's safe to change bools to enums in protobufs if the values line up (going from old to new, true and false are mapped to 1 and 0. going the other direction, zero is false and any non-zero is true). I'm not sure if that works out in this case or not.

Do you think it's worth worrying about? I'd hate to shoot myself in the foot here.

pkg/storage/gc_queue.go, line 423 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The poison mode doesn't matter for a committed intent.

Good call, updated.

pkg/storage/intent_resolver.go, line 109 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Now that I'm paging all of this in, this (the old code) looks incorrect to me. Suppose two transactions call processWriteIntentError for the same intents, one with PUSH_TIMESTAMP and one with PUSH_ABORT. The PUSH_ABORT's PushTxn completes first, then the PUSH_TIMESTAMP's PushTxn sees that the txn was already aborted. Both will then try to resolve the intent, but only one will try to poison. If the PUSH_TIMESTAMP resolve happens first, there will be a window during which the intent is missing but the abort cache has not been written.

It's not clear to me when it's ever safe to resolve an intent from an aborted transaction without setting the abort cache (Except after EndTransaction, when we can clear it), since it appears to depend on knowledge from the operation responsible for the abort.

I think even in the GC queue we have to set the abort cache for intents we resolve (and then we'll have to clean up those abort cache entries on a future GC queue iteration... ouch!). I might be wrong on this point, though.

Yikes, good call on the first one. I expanded the comment.

Re: the GC queue, I think we're in a better position there because we only touch aborted transactions for which we know that their most recent heartbeat is a (large) multiple of the heartbeat timeout. That is, we assume that the client running the transaction is gone (and even if it isn't, it would not reach the KV store on its next request).

pkg/storage/replica_command.go, line 1853 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This mutates the input. I can't remember whether that's safe or not but it seems dirty; let's use a local variable instead.

Yeah, this is pretty dirty. Somehow didn't realize this was a pointer. Changed.

Comments from Reviewable

@nvanbenschoten

Manual testing in cockroachdb#15997 surfaced that one limiting factor in resolving many intents is contention on the transaction's abort cache entry. In one extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends very large batches of intent resolution requests for the same transaction to the intent resolver. These requests all overlapped on the transaction's abort cache key, causing very slow progress, and ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in cockroachdb#18199 it seemed well worth addressing, by means of 1. allow intent resolutions to not touch the abort span 2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key if it is actually going to be accessed. With these changes, the gc queue was able to clear out a million intents comfortably on my older 13" MacBook (single node). Also use this option in the intent resolver, where possible -- most transactions don't receive abort cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid adding artificial contention there, though in many situations the same intent is resolved and so a conflict still exists. Migration: a new field number was added to the proto and the old one preserved. We continue to populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the deprecated field. I believe this is sufficient and we can just remove all traces of the old field in v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).

tbg · 2017-09-30T13:40:33Z

@bdarnell friendly ping -- looks like this fell off your radar. I'd like to include this in a 1.1 release, though at this point it would probably be 1.1.1 instead of 1.1.

bdarnell · 2017-10-01T17:44:36Z

Reviewed 7 of 7 files at r2.
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.

pkg/roachpb/api.proto, line 627 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Do you think it's worth worrying about? I'd hate to shoot myself in the foot here.

Well, it would be a change we could make all at once instead of going through a deprecation cycle. But probably not worth figuring out the subtlety and making sure it's safe.

Nit: this line has two semicolons.

pkg/storage/gc_queue.go, line 423 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Good call, updated.

I think it would be more clear to use PoisonType_Noop, since nothing will happen regardless.

pkg/storage/gc_queue.go, line 411 at r2 (raw file):

				infoMu.Unlock() // intentional
				defer infoMu.Lock()
				// This transaction has committed, so poisoning is not relevant.

I think you've gotten your comments mixed up. This is the ABORTED branch.

pkg/storage/intent_resolver.go, line 109 at r1 (raw file):

Yikes, good call on the first one. I expanded the comment.

But the new code is still incorrect. The old version was incorrect in either ordering for a short time if the TIMESTAMP resolve came first (leaving the abort cache temporarily cleared until the ABORT resolve set it), or for a long time if the ABORT resolve came first and then had its abort cache entry cleared by the TIMESTAMP resolve. The new version only has the former bug, but it's still a problem. The transaction could sneak in in between the TIMESTAMP resolve and the ABORT resolve and silently fail to see the results of its own writes.

Re: the GC queue, I think we're in a better position there because we only touch aborted transactions for which we know that their most recent heartbeat is a (large) multiple of the heartbeat timeout.

That's for cleaning up transaction records, not resolving intents. The GC queue will resolve intents based on the separate intentAgeThreshold. That threshold is currently higher than the transaction threshold (!) but it doesn't have to be (because in any case it will validate the transaction's status before resolving the intent).

I think we can almost get rid of the poison flag in ResolveIntentRequest completely: in most cases the correct poison behavior is determined from the TxnMeta contained in the request (even the GC queue's "very old transaction" optimization can be obtained from looking at TxnMeta.Timestamp) instead of from extra knowledge that the caller supplies.

The exception is for EndTransaction: here, as an optimization, we avoid setting the abort cache for intents we resolve because we know that the coordinator is responsible for this abort (and no future requests will make it back to the client).

So here's my proposal for fixing the existing bug here and optimizing command queue contention as much as we safely can:

Keep the existing poison boolean; we don't need three states
All ResolveIntent callers except the EndTransaction branch of processIntents pass poison=true
declareKeysResolveIntent gets smarter, and only declares a write to the abort cache if one will occur
evalResolveIntent gains a "very old transaction" optimization and skips the write to the abort cache if TxnMeta.Timestamp is too old

pkg/storage/replica_command.go, line 1830 at r2 (raw file):

func writesAbortCache(intentStatus roachpb.TransactionStatus, poisonType roachpb.PoisonType) bool {
	return (intentStatus == roachpb.ABORTED && poisonType == roachpb.PoisonType_Do) ||

This doesn't match the implementation, which writes to the abort cache if intentStatus == roachpb.ABORTED && poisonType != PoisonType_Noop.

pkg/storage/replica_command.go, line 1923 at r2 (raw file):

		return EvalResult{}, err
	}
	if intent.Status == roachpb.ABORTED {

Making this if wrtiesAbortCache(...) would help centralize the knowledge of this condition. (and then maybe we would remove the Noop case from setAbortCache)

pkg/storage/replica_command_test.go, line 48 at r2 (raw file):

			status:   roachpb.ABORTED,
			poison:   roachpb.PoisonType_Clear,
			expSpans: []string{"1 0: {b-c}", `1 1: /{Local/RangeID/99/r/AbortCache/"00000000-0000-0000-0000-000000000000"-Min}`},

Set a non-zero transaction id in the request just so we can make sure it's getting plumbed through correctly.

Comments from Reviewable

tbg · 2017-10-04T14:49:35Z

Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.

pkg/storage/intent_resolver.go, line 109 at r1 (raw file):

The new version only has the former bug, but it's still a problem. The transaction could sneak in in between the TIMESTAMP resolve and the ABORT resolve and silently fail to see the results of its own writes.

What am I missing? If a timestamp resolve comes first it won't poison but that's fine since the intent is still there, and then the abort would remove the intent and poison. The other way around you poison+remove first, and then leave it alone after. Your comment makes it sound like a timestamp resolve would remove the intent, too.

Your suggestions below sound interesting, I'll take a look but it sounds like it should work out.

Comments from Reviewable

bdarnell · 2017-10-04T16:29:19Z

Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.

pkg/storage/intent_resolver.go, line 109 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

The new version only has the former bug, but it's still a problem. The transaction could sneak in in between the TIMESTAMP resolve and the ABORT resolve and silently fail to see the results of its own writes.

What am I missing? If a timestamp resolve comes first it won't poison but that's fine since the intent is still there, and then the abort would remove the intent and poison. The other way around you poison+remove first, and then leave it alone after. Your comment makes it sound like a timestamp resolve would remove the intent, too.

Your suggestions below sound interesting, I'll take a look but it sounds like it should work out.

Our terminology is getting confusing here. There's not actually any such thing as a timestamp resolve. There is a timestamp push, and if that push discovers that the transaction is aborted anyway, then we resolve the intent. So the buggy sequence is:

Transaction A (in snapshot isolation) writes an intent
Transaction B attempts a write which hits this intent. It pushes (aborts) transaction A
Transaction C attempts a read which hits txn A's intent. It tries a push (timestamp), but the push response tells it that txn A is aborted
Txn C's intent resolver resolves the intent. The original pushType no longer matters here; what matters is that the txn was aborted and the intent will be deleted.

Comments from Reviewable

tbg · 2017-10-05T16:19:16Z

Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.

pkg/storage/intent_resolver.go, line 109 at r1 (raw file):

So here's my proposal for fixing the existing bug here and optimizing command queue contention as much as we safely can:

in ResolveIntent, we don't have enough of the proto to decide this right off the bat. I think we want something like

clientKnowsOfAbort := txnProtoAsSeenDuringPush.LastActive() + 2*HeartbeatInterval < ResolveBatchTimestamp

but we only have txn.Timestamp which is largely irrelevant to computing LastActive().

What do you think about the following:

replace Poison with LastActive hlc.Timestamp (or nullable, we'll see)
LastActive is forwarded to ResolveBatchTimestamp if LastActive is zero/nil (for the migration)
ResolveIntent uses WillRemoveIntent && (LastActive + 2*HeartbeatInterval < ResolveBatchTimestamp) to decide whether to populate the abort cache

Note that then, we never explicitly clear the abort cache (except during GC, where these entries are thrown away based on a similar criterion), but that should be OK because the only situation in which we can do that is when an EndTransaction(commit=false) happens, and there we only meaningfully clear out an abort cache on external ranges for which the intents were supplied. So if, say, we had 10 intents on range 2 (and the txn is on range 1), we currently end up sending a batch of 10 ResolveIntents that all overlap on the abort cache key because they all want to clear the span, potentially blocking other activity on these keys for much longer than needed. Seems better to just let the GC queue deal with it instead.

Comments from Reviewable

bdarnell · 2017-10-05T17:44:04Z

SGTM

@bdarnell

Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.

@bdarnell

Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.

@bdarnell

Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.

tbg requested a review from a team September 20, 2017 20:45

tbg force-pushed the noop-poison branch 2 times, most recently from c283254 to a3864c9 Compare September 20, 2017 21:00

tbg requested a review from bdarnell September 21, 2017 17:20

tbg force-pushed the noop-poison branch from a3864c9 to abbe3c4 Compare September 25, 2017 20:53

tbg force-pushed the noop-poison branch from abbe3c4 to 66507ae Compare September 30, 2017 13:40

tbg added a commit to tbg/cockroach that referenced this pull request Oct 5, 2017

storage: prevent accidentally unpoisoning aborted txn

b8bd559

Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.

tbg mentioned this pull request Oct 6, 2017

storage: prevent accidentally unpoisoning aborted txn #19093

Merged

tbg added a commit to tbg/cockroach that referenced this pull request Oct 15, 2017

storage: prevent accidentally unpoisoning aborted txn

a7d5311

Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.

tbg closed this in #19093 Oct 16, 2017

tbg added a commit to tbg/cockroach that referenced this pull request Oct 17, 2017

storage: prevent accidentally unpoisoning aborted txn

ecb89ea

Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.

tbg deleted the noop-poison branch May 8, 2018 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: add "noop" intent resolution poisoning option #18635

storage: add "noop" intent resolution poisoning option #18635

tbg commented Sep 20, 2017

cockroach-teamcity commented Sep 20, 2017

bdarnell commented Sep 21, 2017

tbg commented Sep 25, 2017

tbg commented Sep 30, 2017

bdarnell commented Oct 1, 2017

tbg commented Oct 4, 2017

bdarnell commented Oct 4, 2017

tbg commented Oct 5, 2017

bdarnell commented Oct 5, 2017

storage: add "noop" intent resolution poisoning option #18635

storage: add "noop" intent resolution poisoning option #18635

Conversation

tbg commented Sep 20, 2017

cockroach-teamcity commented Sep 20, 2017

bdarnell commented Sep 21, 2017

tbg commented Sep 25, 2017

tbg commented Sep 30, 2017

bdarnell commented Oct 1, 2017

tbg commented Oct 4, 2017

bdarnell commented Oct 4, 2017

tbg commented Oct 5, 2017

bdarnell commented Oct 5, 2017