-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: merge budgets for in-flight writes and lock spans #66915
kvcoord: merge budgets for in-flight writes and lock spans #66915
Conversation
The commit |
One thing I've been thinking is that we might want to take this occasion to introduce a limit on the number of in-flight keys that we attach to a parallel-commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean roughly that we will check limit in txn_interceptor_committer.canCommitInParallel and stop that if certain limit is exceeded? That makes sense to me as it will only touch large transactions that has less benefits from parallel commits.
Just some nits from my side regarding readability, otherwise looks good.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 80 at r3 (raw file):
// // maxBytes may be zero, or even negative. In that case, each range will be // maximally condensed.
Since there's no difference between 0 and negative size maybe we can say maxBytes less than 1 will force maximum possible condencing? Otherwise I read it as 0 and negative have some difference.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 390 at r3 (raw file):
"cannot perform async consensus because memory budget limit exceeded"
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 1424 at r3 (raw file):
// along range boundaries when they exceed the maximum intent bytes threshold. // // TODO(andrei): Merge this test into TestTxnPipelinerCondenseLockSpans2, which
Is it a long term TODO? I think it would be helpful to at least hightlight what is the difference between checks they make just in case it fails and someone needs to debug it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I've been thinking is that we might want to take this occasion to introduce a limit on the number of in-flight keys that we attach to a parallel-commit EndTxn, for stability purposes - an abandoned STAGING txn record forces conflicting txns to perform recovery, which consists of work on the order of record's in-flight writes. This seems like potentially a major unexpected burden on that other txn, externalized by the committing txn. Thoughts?
I thought that's what kv.transaction.write_pipelining_max_batch_size
did, but it looks like you're right that it's not. The limits in the pipeline do help bound the effective maximum because we'll only let the inFlightWriteSet
get so large, but that doesn't help in the case of a single large batch with an EndTxn. So I agree that we should limit parallel commits to some maximum number of in-flight writes to avoid an unbounded fanout factor during txn recovery. I'm thinking somewhere around 1024 keys.
Does it mean roughly that we will check limit in txn_interceptor_committer.canCommitInParallel and stop that if certain limit is exceeded?
Yes, I think that's right. We'll compare len(et.InFlightWrites)
against a new cluster setting.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 4 of 4 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @andreimatei)
pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 170 at r3 (raw file):
} func (s *condensableSpanSet) asSortedSlice() []roachpb.Span {
Consider moving this to condensable_span_set_test.go
with a comment that we'll want to make this more efficient if it ever needs to be used in production code.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 514 at r1 (raw file):
ctx context.Context, ba roachpb.BatchRequest, br *roachpb.BatchResponse, ) { // After adding new writes to the lock footprint, check whether we need to
Did you mean to remove this?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 251 at r2 (raw file):
} ba.AsyncConsensus = tp.canUseAsyncConsensus(ba)
Want to rebase this second commit away?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 35 at r3 (raw file):
true, ) var _ = settings.RegisterByteSizeSetting(
Can't we just delete this and add it to retiredSettings
?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 340 at r3 (raw file):
// canUseAsyncConsensus checks the conditions necessary for this batch to be // allowed to set the AsyncConsensus flag. func (tp *txnPipeliner) canUseAsyncConsensus(ctx context.Context, ba roachpb.BatchRequest) bool {
While we're here, should we short-circuit this method on if _, hasET := ba.GetArg(roachpb.EndTxn); hasET { return false }
? Now that this is factored out, it seems like a valuable fast-path.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 390 at r3 (raw file):
Previously, aliher1911 (Oleg) wrote…
"cannot perform async consensus because memory budget limit exceeded"
Also, does v4 seem too low? I was thinking v2.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 81 at r3 (raw file):
// // iter is the iterator to use for condensing the lock spans. It can be nil, in // which case the pipeliner will panic if it ever needs to condense.
"needs to condense intent spans"
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 84 at r3 (raw file):
func makeMockTxnPipeliner(iter condensableSpanSetRangeIterator) (txnPipeliner, *mockLockedSender) { mockSender := &mockLockedSender{} metrics := MakeTxnMetrics(time.Hour)
nit: just inline these.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 583 at r3 (raw file):
tp, mockSender := makeMockTxnPipeliner(nil /* iter */) // Disable write_pipelining_max_outstanding_size, and max_intents_bytes limits.
nit: no comma
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 970 at r3 (raw file):
// TestTxnPipelinerMaxInFlightSize tests that batches are not pipelined if doing // so would push the memory used to track locks and in-flight writes over the // limit allowed by the kv.transaction.max_intents_bytes.
nit: remove the period and re-wrap.
Lift a defer() into a wrapper function. Release note: None
This patch short-circuits canUseAyncConsensus for EndTxn batches. Release note: None
7ab7d96
to
4dc342e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 80 at r3 (raw file):
Previously, aliher1911 (Oleg) wrote…
Since there's no difference between 0 and negative size maybe we can say maxBytes less than 1 will force maximum possible condencing? Otherwise I read it as 0 and negative have some difference.
done
pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 170 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Consider moving this to
condensable_span_set_test.go
with a comment that we'll want to make this more efficient if it ever needs to be used in production code.
moved to a helpers_test.go
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 514 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you mean to remove this?
yes. I had done it in the wrong commit. Done.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 251 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Want to rebase this second commit away?
done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 35 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can't we just delete this and add it to
retiredSettings
?
done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 340 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
While we're here, should we short-circuit this method on
if _, hasET := ba.GetArg(roachpb.EndTxn); hasET { return false }
? Now that this is factored out, it seems like a valuable fast-path.
done in a separate commit
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 390 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Also, does v4 seem too low? I was thinking v2.
all done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 81 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"needs to condense intent spans"
done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 84 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: just inline these.
I can't because I need to take their address.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 583 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: no comma
done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 970 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: remove the period and re-wrap.
done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 1424 at r3 (raw file):
Is it a long term TODO?
More of a "never TODO" kind of thing, as most are :). It's only purpose is to discourage extensions to this one and point to the other one.
I think it would be helpful to at least hightlight what is the difference between checks they make just in case it fails and someone needs to debug it.
I don't know what to write exactly, cause I don't want to put the other one in a little box by describing what it currently has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911)
Before this patch, the txnPipeliner maintained two different memory budgets: 1) kv.transaction.max_intents_bytes - a limit on a txn's lock spans 2) kv.transaction.write_pipelining_max_outstanding_size - a limit on a txn's in-flight writes Besides protecting memory usage, these guys also prevent the commit's Raft command from becoming too big. Having two budgets for very related things is unnecessary. In-flight writes frequently turn into lock spans, and so thinking about how one of the budget feeds into the other is confusing. The exhaustion of the in-flight budget also had a hand in this, by turning in-flight writes into locks immediately. This patch makes write_pipelining_max_outstanding_bytes a no-op. max_intent_bytes takes on also tracking in-flight writes. A request whose async consensus writes would push this budget over the limit is not allowed to perform async-consensus, on the argument that performing sync writes is better because the locks from those writes can be condensed. This patch is also done with an eye towards offering an option to reject transactions that are about to go over budget. Having a single budget to think about makes that conceptually simpler. Release note (general change): The setting `kv.transaction.write_pipelining_max_outstanding_size` becomes a no-op. Its function is folded into the `kv.transaction.max_intents_bytes` setting.
4dc342e
to
0149495
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
One thing I've been thinking is that we might want to take this occasion to introduce a limit on the number of in-flight keys that we attach to a parallel-commit EndTxn, for stability purposes - an abandoned STAGING txn record forces conflicting txns to perform recovery, which consists of work on the order of record's in-flight writes. This seems like potentially a major unexpected burden on that other txn, externalized by the committing txn. Thoughts?
I thought that's what kv.transaction.write_pipelining_max_batch_size did, but it looks like you're right that it's not. The limits in the pipeline do help bound the effective maximum because we'll only let the inFlightWriteSet get so large, but that doesn't help in the case of a single large batch with an EndTxn. So I agree that we should limit parallel commits to some maximum number of in-flight writes to avoid an unbounded fanout factor during txn recovery. I'm thinking somewhere around 1024 keys.
Does it mean roughly that we will check limit in txn_interceptor_committer.canCommitInParallel and stop that if certain limit is exceeded?
Yes, I think that's right. We'll compare len(et.InFlightWrites) against a new cluster setting.
I'll send a separate PR for this.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aliher1911)
Build succeeded: |
Before this patch, the txnPipeliner maintained two different memory
budgets:
txn's in-flight writes
Besides protecting memory usage, these guys also prevent the commit's
Raft command from becoming too big.
Having two budgets for very related things is unnecessary. In-flight
writes frequently turn into lock spans, and so thinking about how one of
the budget feeds into the other is confusing. The exhaustion of the
in-flight budget also had a hand in this, by turning in-flight writes
into locks immediately.
This patch makes write_pipelining_max_outstanding_bytes a no-op.
max_intent_bytes takes on also tracking in-flight writes. A request
whose async consensus writes would push this budget over the limit is
not allowed to perform async-consensus, on the argument that performing
sync writes is better because the locks from those writes can be
condensed.
This patch is also done with an eye towards offering an option to reject
transactions that are about to go over budget. Having a single budget to
think about makes that conceptually simpler.
Release note (general change): The setting
kv.transaction.write_pipelining_max_outstanding_size
becomes a no-op.Its function is folded into the
kv.transaction.max_intents_bytes
setting.
cc @cockroachdb/kv