-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: add setting to reject overly large transactions #135945
kvserver: add setting to reject overly large transactions #135945
Conversation
cd7512e
to
210cebe
Compare
210cebe
to
342ead0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @nvanbenschoten in case you want to give this a second look as well, given we'll be backporting this to all release branches
Reviewed 4 of 4 files at r1, 3 of 3 files at r2, 6 of 6 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 2595 at r4 (raw file):
for _, tc := range testCases { t.Run(tc.name, func(t *testing.T) { testutils.RunTrueAndFalse(t, "reject-threshold", func(t *testing.T, reject bool) {
This test and variable name isn't quite clear -- should we call these reject-using-txn-max-bytes
instead?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 2602 at r4 (raw file):
tp, mockSender := makeMockTxnPipeliner(nil /* iter */) if reject { rejectTxnMaxBytes.Override(ctx, &tp.st.SV, tc.maxSize)
We should also add variants where the TrackedWritesMaxSize
<< rejectTxnMaxBytes
. Or have all variants run as such -- this then tests the case where we condense but don't reject, which is the goal of this newly introduced cluster setting.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 48 at r4 (raw file):
// whether to reject transactions based on the total size of the transaction // spans. condensedBytes int64
condensedBytes is a a confusing name -- naively, I'd read it as "number of bytes condensed".
Should this instead be totalUncondensedBytes
?
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 81 at r4 (raw file):
s.bytes += spanSize(sp) } s.condensedBytes += prevSize - s.bytes
Do we need this logic? Or is this tracking easier if we just track it every time lockFootprint.insert()
is called instead?
Separately, should we be tracking this for unreplicated locks?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 114 at r4 (raw file):
settings.WithPublic) // rejectTxnMaxBytes will reject transactions if the maximum number of intent
Note that as is, this is more than just intent bytes -- we're also including unreplicated and replicated locking reads in this calculation. Should we be? I think we should be including replicated locks, but not unreplicated locks. Thoughts?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 321 at r4 (raw file):
rejectOverBudget := rejectTxnOverTrackedWritesBudget.Get(&tp.st.SV) condenseThreshold := TrackedWritesMaxSize.Get(&tp.st.SV) if err := tp.maybeRejectOverBudget(ba, condenseThreshold, rejectOverBudget, rejectTxnMaxBytes.Get(&tp.st.SV)); err != nil {
nit: let's pull up a variable for rejectTxnMaxBytes
above for uniforimity.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 407 at r4 (raw file):
// full intent size on the server. if rejectTxnMaxBytes > 0 && estimate+tp.lockFootprint.condensedBytes > rejectTxnMaxBytes { tp.txnMetrics.TxnsRejectedByLockSpanBudget.Inc(1)
Let's add a new metric for this instead of clubbing it together with the same one.
342ead0
to
65c88f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I originally missed the implications of tracking the unreplicated locks. I added the spliting based on durabilty to the condensableSpanSet
which should help. I'm going to add another test to the last commit but wanted to make sure you thought this approach looked good first.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 48 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
condensedBytes is a a confusing name -- naively, I'd read it as "number of bytes condensed".
Should this instead be
totalUncondensedBytes
?
Good idea - changed and changed the meeting a little bit to be all bytes.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 81 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Do we need this logic? Or is this tracking easier if we just track it every time
lockFootprint.insert()
is called instead?Separately, should we be tracking this for unreplicated locks?
I moved it to insert - this is easier to reason about now. I was originally trying to use this variable to not include the current "live" bytes, but I realized it is better to just track all inserts. This simplifies the maybeCondense
function also.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 114 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Note that as is, this is more than just intent bytes -- we're also including unreplicated and replicated locking reads in this calculation. Should we be? I think we should be including replicated locks, but not unreplicated locks. Thoughts?
This is a good point. I changed the text. I added another commit which will only add them to the "server side" tracking if they are durable. I'm not sure if the change I made is all correct though, so please review the true/false parameter passed in now.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 321 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: let's pull up a variable for
rejectTxnMaxBytes
above for uniforimity.
Done
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 407 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Let's add a new metric for this instead of clubbing it together with the same one.
Added TxnsRejectedByLockSizeLimit
metric.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 2595 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
This test and variable name isn't quite clear -- should we call these
reject-using-txn-max-bytes
instead?
Changed both to help clarify.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 2602 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
We should also add variants where the
TrackedWritesMaxSize
<<rejectTxnMaxBytes
. Or have all variants run as such -- this then tests the case where we condense but don't reject, which is the goal of this newly introduced cluster setting.
Good point - I will add this, but can you review to make sure the other changes look OK first to make sure I'm testing the "right thing". Thanks!
65c88f0
to
3279071
Compare
Converting to draft. I realized that the bytes here are only the key bytes, so for large values, this is fairly inaccurate. Im looking at alternatives. |
9996bd8
to
4a5a99c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some lower level comments inline, but I my main comment here is that we should do this tracking on the txnInterceptorPipeliner
instead of pushing it into the more general condensableSpanSet
datastructure. Maybe we can combine the first two commits to do this instead. Wdyt?
Reviewed 10 of 10 files at r8, 5 of 5 files at r9, 6 of 6 files at r10, 4 of 4 files at r11, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
-- commits
line 15 at r9:
nit: is this commit message stale? In the previous commit, we started tracking insertCount
. Separately, s/tracing/tracking?
-- commits
line 48 at r11:
Do we need an analog metric to TxnsInFlightLocksOverTrackingBudget
as well? In particular, a metric + warning that accounts for replicated locks acquired by Scan/ReverseScan
requests on the response path.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 44 at r8 (raw file):
condensed bool // insertCount is the number of bytes that are inserted into this span set.
Is the comment here stale? In particular, the "number of bytes" portion.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 46 at r9 (raw file):
// insertCount is the number of bytes that are inserted into this span set. // This may be higher if there are duplicate spans inserted. It can be used // to track the impact of the intents on the server side.
nit: let's expand this comment (in r2) to talk about durable locks (intents + other lock strengths) instead of just intents.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 55 at r9 (raw file):
// insert adds new spans to the condensable span set. No attempt to condense the // set or deduplicate the new span with existing spans is made. func (s *condensableSpanSet) insert(durable bool, spans ...roachpb.Span) {
nit: s/durable/isDurable/g
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go
line 658 at r9 (raw file):
log.VEventf(ctx, 3, "recording span to refresh: %s", span.String()) } sr.refreshFootprint.insert(true, span)
It feels a bit off that we're passing in true /* isDurable */
in refreshFootprint.insert
. I think we've got the layering here incorrect -- maybe we should be tracking durable locks on the txnInterceptorPipeliner
instead of this more general condensableSpanSet
datastructure.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 818 at r9 (raw file):
str, durability = readOnlyReq.KeyLocking() } trackLocks := func(span roachpb.Span, _ lock.Durability) {
would it be easier to grab the durability from here instead?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 841 at r9 (raw file):
func (tp *txnPipeliner) trackLocks(s roachpb.Span, _ lock.Durability) { tp.lockFootprint.insert(true, s)
Should this be passing in the correct durability? Seems like it's plumbed through till here, we're just leaving it unused in this method.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 925 at r9 (raw file):
if tp.ifWrites.len() > 0 { tp.ifWrites.ascend(func(w *inFlightWrite) { tp.lockFootprint.insert(true, roachpb.Span{Key: w.Key})
nit: true /* isDurable */
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 945 at r9 (raw file):
tp.ifWrites.ascend(func(w *inFlightWrite) { if w.Sequence >= s.seqNum { tp.lockFootprint.insert(true, roachpb.Span{Key: w.Key})
nit: true /* isDurable */
We know that these are durable because they're part of the inflight write set.
pkg/kv/kvclient/kvcoord/condensable_span_set_test.go
line 15 at r9 (raw file):
"github.com/cockroachdb/cockroach/pkg/util/log" "github.com/stretchr/testify/require" )
Should we add/expand a test to cover both s.insert(true/false...)
calls?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 397 at r10 (raw file):
} totalCount := tp.lockFootprint.insertCount + int64(tp.ifWrites.len())
Just like we estimate the size of the spans in the supplied batch request above (lockFootprint.estimateSize), does this also need to account for the supplied batch request?
Similar to the estimation above, I don't think we can account for replicated {,Reverse}ScanRequests. However, we should be able to consider replicated Get requests and write requests here.
4a5a99c
to
837c1e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I stripped out the two top commits and moved the insert count to the txnInterceptor
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
Previously, arulajmani (Arul Ajmani) wrote…
nit: is this commit message stale? In the previous commit, we started tracking
insertCount
. Separately, s/tracing/tracking?
Commit removed / merged.
Previously, arulajmani (Arul Ajmani) wrote…
Do we need an analog metric to
TxnsInFlightLocksOverTrackingBudget
as well? In particular, a metric + warning that accounts for replicated locks acquired byScan/ReverseScan
requests on the response path.
I added the log warning but not the metric. I'm not sure how useful the metric would be in practice since either we will stop using this txn in which case we didn't reject anything, or we will keep using it and the TxnsRejectedByLockCountLimit
limit will be exceeded. That said, it wouldn't be hard to add the metric if you think it would be useful.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 818 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
would it be easier to grab the durability from here instead?
Yep - I missed that. Changed
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 841 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should this be passing in the correct durability? Seems like it's plumbed through till here, we're just leaving it unused in this method.
Yes - changed
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 925 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: true /* isDurable */
Removed parameter
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 945 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: true /* isDurable */
We know that these are durable because they're part of the inflight write set.
Removed parameter
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 397 at r10 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Just like we estimate the size of the spans in the supplied batch request above (lockFootprint.estimateSize), does this also need to account for the supplied batch request?
Similar to the estimation above, I don't think we can account for replicated {,Reverse}ScanRequests. However, we should be able to consider replicated Get requests and write requests here.
After the change I didn't need to separate out the inflight from the locked spans, so this became simpler.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 44 at r8 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Is the comment here stale? In particular, the "number of bytes" portion.
Fixed/removed
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 46 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: let's expand this comment (in r2) to talk about durable locks (intents + other lock strengths) instead of just intents.
Fixed/removed.
pkg/kv/kvclient/kvcoord/condensable_span_set.go
line 55 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: s/durable/isDurable/g
fixed/removed
pkg/kv/kvclient/kvcoord/condensable_span_set_test.go
line 15 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we add/expand a test to cover both
s.insert(true/false...)
calls?
This is now stale since this was reverted.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go
line 658 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
It feels a bit off that we're passing in
true /* isDurable */
inrefreshFootprint.insert
. I think we've got the layering here incorrect -- maybe we should be tracking durable locks on thetxnInterceptorPipeliner
instead of this more generalcondensableSpanSet
datastructure.
I agree - this made more sense when we were tracking bytes. With the simpler count design this doesn't need to touch the span at all.
837c1e4
to
9eba1a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
Previously, andrewbaptist (Andrew Baptist) wrote…
I added the log warning but not the metric. I'm not sure how useful the metric would be in practice since either we will stop using this txn in which case we didn't reject anything, or we will keep using it and the
TxnsRejectedByLockCountLimit
limit will be exceeded. That said, it wouldn't be hard to add the metric if you think it would be useful.
I added the metric also. Let me know if you think it makes sense to leave in.
9eba1a6
to
0e5b0e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch . This ended up in a really nice place, so thanks for iterating on it.
Just a reminder to spruce up the testing with what you had in mind before merging. I can have another look if you want me to before we bors, but I'll stamp this.
Reviewed 5 of 8 files at r12, 2 of 2 files at r13, 1 of 4 files at r14, 3 of 3 files at r15, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 2602 at r4 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Good point - I will add this, but can you review to make sure the other changes look OK first to make sure I'm testing the "right thing". Thanks!
Reminder to come back to this comment as this PR looks close.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 393 at r13 (raw file):
if err := ba.LockSpanIterate(nil /* br */, func(sp roachpb.Span, durability lock.Durability) { spans = append(spans, sp) if durability == lock.Replicated {
From slack:
I didn’t do anything special for durable scan requests as for the estimate it is easiest to just count it as “1”.
Could you leave a comment indicating that we're possibly underestimating scan/reverse scan requests (which intend to acquire durable locks) here by counting those as 1. We'll know for sure on the response path when we update tp.writeCount
.
EDIT: I see you've got the comment in a subsequent commit.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 266 at r12 (raw file):
lockFootprint condensableSpanSet // writeCount counts the number of replicated locks and intents to this
nit: "number of replicated lock acquisitions and intents written by this"
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 733 at r15 (raw file):
tp.txnMetrics.TxnsInFlightLocksOverTrackingBudget.Inc(1) } if tp.writeCount > rejectTxnMaxCount {
nit: consider adding a comment such as "similar to the in-flight writes case above, we may have gone over the rejectTxnMaxCount threshold because we don't accurately estimate the number of ranged locking reads before sending the request".
Previously we had an ability to reject transactions that required too much memory, however this was tied together with the condense limit. The limits have too different purposes and should be set independently. The condense size limit (kv.transaction.max_intents_bytes) is used to protect the memory on the client side by using less precise tracking once it passes a certain size. The new limit `kv.transaction.max_intents_and_locks` is intended to protect the server side by preventing a client from creating a transaction with too many operations in it. Large transactions with many intents can have a negative impact especially in a multi-tenant system. Epic: none Fixes: cockroachdb#135841 Release note (ops change): Adds a new configurable parameter kv.transaction.max_intents_and_locks which will prevent transactions from creating too many intents.
This commit adds a new metric txn.count_limit_rejected which tracks the number of transactions that are rejected based on number of intents or replicated locks in the transaction. Epic: none Release note (ops change): Add metric txn.count_limit_rejected.
This commit adds a new metric txn.count_limit_on_response which tracks the number of transactions that exceeded the max transaction count on their response. Epic: none Release note (ops change): Add metric txn.count_limit_on_response.
0e5b0e9
to
7ef9f24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I added test cases to TestTxnPipelinerRejectAboveBudget
to cover all the additional scenarios. I don't have a test that combines both types of rejections, but I don't think that is too important. I'm going to run one more full test on a cluster to make sure this works, but otherwise I think it should be ready if you want to take one more look, or just stamp it.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 266 at r12 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: "number of replicated lock acquisitions and intents written by this"
Changed
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 393 at r13 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
From slack:
I didn’t do anything special for durable scan requests as for the estimate it is easiest to just count it as “1”.
Could you leave a comment indicating that we're possibly underestimating scan/reverse scan requests (which intend to acquire durable locks) here by counting those as 1. We'll know for sure on the response path when we update
tp.writeCount
.EDIT: I see you've got the comment in a subsequent commit.
Ack
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 733 at r15 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: consider adding a comment such as "similar to the in-flight writes case above, we may have gone over the rejectTxnMaxCount threshold because we don't accurately estimate the number of ranged locking reads before sending the request".
Added - good point.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 2602 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Reminder to come back to this comment as this PR looks close.
I now added all the relevant test cases to TestTxnPipelinerRejectAboveBudget. Let me know if you see anything else that seems worth testing. I don't explicitly test logging, but I did verify they appear in a manual test.
I ran a test on a cluster with the following: Create a KV database with > 100K rows inserted.
So everything looks good other than the message not being very accurate (mentioning bytes rather than rows). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
TFTR! bors r=arulajmani |
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #135841: branch-release-24.3. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thanks for getting this over the line! @andrewbaptist, re: backport, I forget: Does this logic live in kvserver or kvclient or a mix? Reminder that Serverless will have many SQL pods on 24.2 for many months, but the KV pods will be on 24.3 by end of this week. |
@joshimhoff - This logic lives in kvclient, so it requires the pods to get to 24.3. Also unfortunately this missed the 24.3.1 cut so it will be in 24.3.2. I will add a backport to 24.2.x so that the next release will pick that up. Is there something that pushes SQL pods to the latest point release? |
blathers backport 24.2 |
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #135841: branch-release-24.2. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously we had an ability to reject transactions that required too much memory, however this was tied together with the condense limit. The limits have too different purposes and should be set independently. The condense size limit (kv.transaction.max_intents_bytes) is used to protect the memory on the client side by using less precise tracking once it passes a certain size. The new limit
kv.transaction.max_intents_and_locks
is intended to protect the server side by preventing a client from creating a transaction with too many operations in it. Large transactions with many intents can have a negative impact especially in a multi-tenant system.Epic: none
Fixes: #135841
Release note (ops change): Adds a new configurable parameter kv.transaction.max_intents_and_locks which will prevent transactions from creating too many intents.