-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add "noop" intent resolution poisoning option #18635
Conversation
c283254
to
a3864c9
Compare
Reviewed 3 of 8 files at r1. pkg/roachpb/api.proto, line 627 at r1 (raw file):
It's safe to change bools to enums in protobufs if the values line up (going from old to new, true and false are mapped to 1 and 0. going the other direction, zero is false and any non-zero is true). I'm not sure if that works out in this case or not. pkg/storage/gc_queue.go, line 423 at r1 (raw file):
The poison mode doesn't matter for a committed intent. pkg/storage/intent_resolver.go, line 109 at r1 (raw file):
Now that I'm paging all of this in, this (the old code) looks incorrect to me. Suppose two transactions call processWriteIntentError for the same intents, one with PUSH_TIMESTAMP and one with PUSH_ABORT. The PUSH_ABORT's PushTxn completes first, then the PUSH_TIMESTAMP's PushTxn sees that the txn was already aborted. Both will then try to resolve the intent, but only one will try to poison. If the PUSH_TIMESTAMP resolve happens first, there will be a window during which the intent is missing but the abort cache has not been written. It's not clear to me when it's ever safe to resolve an intent from an aborted transaction without setting the abort cache (Except after EndTransaction, when we can clear it), since it appears to depend on knowledge from the operation responsible for the abort. I think even in the GC queue we have to set the abort cache for intents we resolve (and then we'll have to clean up those abort cache entries on a future GC queue iteration... ouch!). I might be wrong on this point, though. pkg/storage/replica_command.go, line 1853 at r1 (raw file):
This mutates the input. I can't remember whether that's safe or not but it seems dirty; let's use a local variable instead. Comments from Reviewable |
Ready for another look. Review status: 2 of 8 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. pkg/roachpb/api.proto, line 627 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Do you think it's worth worrying about? I'd hate to shoot myself in the foot here. pkg/storage/gc_queue.go, line 423 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Good call, updated. pkg/storage/intent_resolver.go, line 109 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yikes, good call on the first one. I expanded the comment. Re: the GC queue, I think we're in a better position there because we only touch aborted transactions for which we know that their most recent heartbeat is a (large) multiple of the heartbeat timeout. That is, we assume that the client running the transaction is gone (and even if it isn't, it would not reach the KV store on its next request). pkg/storage/replica_command.go, line 1853 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yeah, this is pretty dirty. Somehow didn't realize this was a pointer. Changed. Comments from Reviewable |
Manual testing in cockroachdb#15997 surfaced that one limiting factor in resolving many intents is contention on the transaction's abort cache entry. In one extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends very large batches of intent resolution requests for the same transaction to the intent resolver. These requests all overlapped on the transaction's abort cache key, causing very slow progress, and ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in cockroachdb#18199 it seemed well worth addressing, by means of 1. allow intent resolutions to not touch the abort span 2. correctly declare the keys for `ResolveIntent{,Range}` to only declare the abort cache key if it is actually going to be accessed. With these changes, the gc queue was able to clear out a million intents comfortably on my older 13" MacBook (single node). Also use this option in the intent resolver, where possible -- most transactions don't receive abort cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid adding artificial contention there, though in many situations the same intent is resolved and so a conflict still exists. Migration: a new field number was added to the proto and the old one preserved. We continue to populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the deprecated field. I believe this is sufficient and we can just remove all traces of the old field in v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).
@bdarnell friendly ping -- looks like this fell off your radar. I'd like to include this in a 1.1 release, though at this point it would probably be 1.1.1 instead of 1.1. |
Reviewed 7 of 7 files at r2. pkg/roachpb/api.proto, line 627 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Well, it would be a change we could make all at once instead of going through a deprecation cycle. But probably not worth figuring out the subtlety and making sure it's safe. Nit: this line has two semicolons. pkg/storage/gc_queue.go, line 423 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I think it would be more clear to use PoisonType_Noop, since nothing will happen regardless. pkg/storage/gc_queue.go, line 411 at r2 (raw file):
I think you've gotten your comments mixed up. This is the pkg/storage/intent_resolver.go, line 109 at r1 (raw file):
But the new code is still incorrect. The old version was incorrect in either ordering for a short time if the TIMESTAMP resolve came first (leaving the abort cache temporarily cleared until the ABORT resolve set it), or for a long time if the ABORT resolve came first and then had its abort cache entry cleared by the TIMESTAMP resolve. The new version only has the former bug, but it's still a problem. The transaction could sneak in in between the TIMESTAMP resolve and the ABORT resolve and silently fail to see the results of its own writes.
That's for cleaning up transaction records, not resolving intents. The GC queue will resolve intents based on the separate intentAgeThreshold. That threshold is currently higher than the transaction threshold (!) but it doesn't have to be (because in any case it will validate the transaction's status before resolving the intent). I think we can almost get rid of the poison flag in ResolveIntentRequest completely: in most cases the correct poison behavior is determined from the TxnMeta contained in the request (even the GC queue's "very old transaction" optimization can be obtained from looking at TxnMeta.Timestamp) instead of from extra knowledge that the caller supplies. The exception is for EndTransaction: here, as an optimization, we avoid setting the abort cache for intents we resolve because we know that the coordinator is responsible for this abort (and no future requests will make it back to the client). So here's my proposal for fixing the existing bug here and optimizing command queue contention as much as we safely can:
pkg/storage/replica_command.go, line 1830 at r2 (raw file):
This doesn't match the implementation, which writes to the abort cache if pkg/storage/replica_command.go, line 1923 at r2 (raw file):
Making this pkg/storage/replica_command_test.go, line 48 at r2 (raw file):
Set a non-zero transaction id in the request just so we can make sure it's getting plumbed through correctly. Comments from Reviewable |
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. pkg/storage/intent_resolver.go, line 109 at r1 (raw file):
What am I missing? If a timestamp resolve comes first it won't poison but that's fine since the intent is still there, and then the abort would remove the intent and poison. The other way around you poison+remove first, and then leave it alone after. Your comment makes it sound like a timestamp resolve would remove the intent, too. Your suggestions below sound interesting, I'll take a look but it sounds like it should work out. Comments from Reviewable |
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. pkg/storage/intent_resolver.go, line 109 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Our terminology is getting confusing here. There's not actually any such thing as a timestamp resolve. There is a timestamp push, and if that push discovers that the transaction is aborted anyway, then we resolve the intent. So the buggy sequence is:
Comments from Reviewable |
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. pkg/storage/intent_resolver.go, line 109 at r1 (raw file):
in
but we only have What do you think about the following:
Note that then, we never explicitly clear the abort cache (except during GC, where these entries are thrown away based on a similar criterion), but that should be OK because the only situation in which we can do that is when an Comments from Reviewable |
SGTM |
Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.
Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.
Discovered by @bdarnell in cockroachdb#18635 (comment). Making poisoning happen less often (to reduce contention) is planned but requires more care.
NB: I'm flushing this out of a slew of related work, and think the migration story and the
changes to when we clear the abort cache deserve a particularly scrutinous review.
Manual testing in #15997 surfaced that one limiting
factor in resolving many intents is contention on the transaction's abort cache entry. In one
extreme test, I wrote 10E6 abortable intents into a single range, in which case the GC queue sends
very large batches of intent resolution requests for the same transaction to the intent resolver.
These requests all overlapped on the transaction's abort cache key, causing very slow progress, and
ultimately preventing the GC queue from making a dent in the minute allotted to it. Generally this
appears to be a somewhat atypical case, but since @nvanbenschoten observed something similar in
#18199 it seemed well worth addressing, by means of
ResolveIntent{,Range}
to only declare the abort cache keyif it is actually going to be accessed.
With these changes, the gc queue was able to clear out a million intents comfortably on my older
13" MacBook (single node).
Also use this option in the intent resolver, where possible -- most transactions don't receive abort
cache entries, and intents are often "found" by multiple conflicting writers. We want to avoid
adding artificial contention there, though in many situations the same intent is resolved and so a
conflict still exists.
Migration: a new field number was added to the proto and the old one preserved. We continue to
populate it. Downstream of Raft, we use the new field but if it's unset, synthesize from the
deprecated field. I believe this is sufficient and we can just remove all traces of the old field in
v1.3. (v1.1 uses the old, v1.2 uses the new with compatibility for the old, v1.3 only the new field).