Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add setting to reject overly large transactions #135945

Merged

Conversation

andrewbaptist
Copy link
Collaborator

@andrewbaptist andrewbaptist commented Nov 21, 2024

Previously we had an ability to reject transactions that required too much memory, however this was tied together with the condense limit. The limits have too different purposes and should be set independently. The condense size limit (kv.transaction.max_intents_bytes) is used to protect the memory on the client side by using less precise tracking once it passes a certain size. The new limit kv.transaction.max_intents_and_locks is intended to protect the server side by preventing a client from creating a transaction with too many operations in it. Large transactions with many intents can have a negative impact especially in a multi-tenant system.

Epic: none

Fixes: #135841

Release note (ops change): Adds a new configurable parameter kv.transaction.max_intents_and_locks which will prevent transactions from creating too many intents.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@arulajmani arulajmani self-requested a review November 22, 2024 01:20
@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch 6 times, most recently from cd7512e to 210cebe Compare December 3, 2024 21:05
@andrewbaptist andrewbaptist marked this pull request as ready for review December 3, 2024 22:05
@andrewbaptist andrewbaptist requested a review from a team as a code owner December 3, 2024 22:05
@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 210cebe to 342ead0 Compare December 4, 2024 15:34
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nvanbenschoten in case you want to give this a second look as well, given we'll be backporting this to all release branches

Reviewed 4 of 4 files at r1, 3 of 3 files at r2, 6 of 6 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 2595 at r4 (raw file):

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			testutils.RunTrueAndFalse(t, "reject-threshold", func(t *testing.T, reject bool) {

This test and variable name isn't quite clear -- should we call these reject-using-txn-max-bytes instead?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 2602 at r4 (raw file):

				tp, mockSender := makeMockTxnPipeliner(nil /* iter */)
				if reject {
					rejectTxnMaxBytes.Override(ctx, &tp.st.SV, tc.maxSize)

We should also add variants where the TrackedWritesMaxSize << rejectTxnMaxBytes. Or have all variants run as such -- this then tests the case where we condense but don't reject, which is the goal of this newly introduced cluster setting.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 48 at r4 (raw file):

	// whether to reject transactions based on the total size of the transaction
	// spans.
	condensedBytes int64

condensedBytes is a a confusing name -- naively, I'd read it as "number of bytes condensed".

Should this instead be totalUncondensedBytes?


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 81 at r4 (raw file):

			s.bytes += spanSize(sp)
		}
		s.condensedBytes += prevSize - s.bytes

Do we need this logic? Or is this tracking easier if we just track it every time lockFootprint.insert() is called instead?

Separately, should we be tracking this for unreplicated locks?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 114 at r4 (raw file):

	settings.WithPublic)

// rejectTxnMaxBytes will reject transactions if the maximum number of intent

Note that as is, this is more than just intent bytes -- we're also including unreplicated and replicated locking reads in this calculation. Should we be? I think we should be including replicated locks, but not unreplicated locks. Thoughts?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 321 at r4 (raw file):

	rejectOverBudget := rejectTxnOverTrackedWritesBudget.Get(&tp.st.SV)
	condenseThreshold := TrackedWritesMaxSize.Get(&tp.st.SV)
	if err := tp.maybeRejectOverBudget(ba, condenseThreshold, rejectOverBudget, rejectTxnMaxBytes.Get(&tp.st.SV)); err != nil {

nit: let's pull up a variable for rejectTxnMaxBytes above for uniforimity.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 407 at r4 (raw file):

	// full intent size on the server.
	if rejectTxnMaxBytes > 0 && estimate+tp.lockFootprint.condensedBytes > rejectTxnMaxBytes {
		tp.txnMetrics.TxnsRejectedByLockSpanBudget.Inc(1)

Let's add a new metric for this instead of clubbing it together with the same one.

@arulajmani arulajmani requested review from nvanbenschoten and a team December 6, 2024 17:35
@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 342ead0 to 65c88f0 Compare December 6, 2024 21:29
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I originally missed the implications of tracking the unreplicated locks. I added the spliting based on durabilty to the condensableSpanSet which should help. I'm going to add another test to the last commit but wanted to make sure you thought this approach looked good first.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 48 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

condensedBytes is a a confusing name -- naively, I'd read it as "number of bytes condensed".

Should this instead be totalUncondensedBytes?

Good idea - changed and changed the meeting a little bit to be all bytes.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 81 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Do we need this logic? Or is this tracking easier if we just track it every time lockFootprint.insert() is called instead?

Separately, should we be tracking this for unreplicated locks?

I moved it to insert - this is easier to reason about now. I was originally trying to use this variable to not include the current "live" bytes, but I realized it is better to just track all inserts. This simplifies the maybeCondense function also.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 114 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Note that as is, this is more than just intent bytes -- we're also including unreplicated and replicated locking reads in this calculation. Should we be? I think we should be including replicated locks, but not unreplicated locks. Thoughts?

This is a good point. I changed the text. I added another commit which will only add them to the "server side" tracking if they are durable. I'm not sure if the change I made is all correct though, so please review the true/false parameter passed in now.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 321 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: let's pull up a variable for rejectTxnMaxBytes above for uniforimity.

Done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 407 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Let's add a new metric for this instead of clubbing it together with the same one.

Added TxnsRejectedByLockSizeLimit metric.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 2595 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

This test and variable name isn't quite clear -- should we call these reject-using-txn-max-bytes instead?

Changed both to help clarify.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 2602 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

We should also add variants where the TrackedWritesMaxSize << rejectTxnMaxBytes. Or have all variants run as such -- this then tests the case where we condense but don't reject, which is the goal of this newly introduced cluster setting.

Good point - I will add this, but can you review to make sure the other changes look OK first to make sure I'm testing the "right thing". Thanks!

@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 65c88f0 to 3279071 Compare December 9, 2024 19:14
@andrewbaptist
Copy link
Collaborator Author

Converting to draft. I realized that the bytes here are only the key bytes, so for large values, this is fairly inaccurate. Im looking at alternatives.

@andrewbaptist andrewbaptist marked this pull request as draft December 9, 2024 20:30
@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch 2 times, most recently from 9996bd8 to 4a5a99c Compare December 11, 2024 16:58
@andrewbaptist andrewbaptist marked this pull request as ready for review December 11, 2024 17:42
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some lower level comments inline, but I my main comment here is that we should do this tracking on the txnInterceptorPipeliner instead of pushing it into the more general condensableSpanSet datastructure. Maybe we can combine the first two commits to do this instead. Wdyt?

Reviewed 10 of 10 files at r8, 5 of 5 files at r9, 6 of 6 files at r10, 4 of 4 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


-- commits line 15 at r9:
nit: is this commit message stale? In the previous commit, we started tracking insertCount. Separately, s/tracing/tracking?


-- commits line 48 at r11:
Do we need an analog metric to TxnsInFlightLocksOverTrackingBudget as well? In particular, a metric + warning that accounts for replicated locks acquired by Scan/ReverseScan requests on the response path.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 44 at r8 (raw file):

	condensed bool

	// insertCount is the number of bytes that are inserted into this span set.

Is the comment here stale? In particular, the "number of bytes" portion.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 46 at r9 (raw file):

	// insertCount is the number of bytes that are inserted into this span set.
	// This may be higher if there are duplicate spans inserted. It can be used
	// to track the impact of the intents on the server side.

nit: let's expand this comment (in r2) to talk about durable locks (intents + other lock strengths) instead of just intents.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 55 at r9 (raw file):

// insert adds new spans to the condensable span set. No attempt to condense the
// set or deduplicate the new span with existing spans is made.
func (s *condensableSpanSet) insert(durable bool, spans ...roachpb.Span) {

nit: s/durable/isDurable/g


pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go line 658 at r9 (raw file):

			log.VEventf(ctx, 3, "recording span to refresh: %s", span.String())
		}
		sr.refreshFootprint.insert(true, span)

It feels a bit off that we're passing in true /* isDurable */ in refreshFootprint.insert. I think we've got the layering here incorrect -- maybe we should be tracking durable locks on the txnInterceptorPipeliner instead of this more general condensableSpanSet datastructure.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 818 at r9 (raw file):

				str, durability = readOnlyReq.KeyLocking()
			}
			trackLocks := func(span roachpb.Span, _ lock.Durability) {

would it be easier to grab the durability from here instead?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 841 at r9 (raw file):

func (tp *txnPipeliner) trackLocks(s roachpb.Span, _ lock.Durability) {
	tp.lockFootprint.insert(true, s)

Should this be passing in the correct durability? Seems like it's plumbed through till here, we're just leaving it unused in this method.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 925 at r9 (raw file):

	if tp.ifWrites.len() > 0 {
		tp.ifWrites.ascend(func(w *inFlightWrite) {
			tp.lockFootprint.insert(true, roachpb.Span{Key: w.Key})

nit: true /* isDurable */


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 945 at r9 (raw file):

	tp.ifWrites.ascend(func(w *inFlightWrite) {
		if w.Sequence >= s.seqNum {
			tp.lockFootprint.insert(true, roachpb.Span{Key: w.Key})

nit: true /* isDurable */

We know that these are durable because they're part of the inflight write set.


pkg/kv/kvclient/kvcoord/condensable_span_set_test.go line 15 at r9 (raw file):

	"github.com/cockroachdb/cockroach/pkg/util/log"
	"github.com/stretchr/testify/require"
)

Should we add/expand a test to cover both s.insert(true/false...) calls?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 397 at r10 (raw file):

	}

	totalCount := tp.lockFootprint.insertCount + int64(tp.ifWrites.len())

Just like we estimate the size of the spans in the supplied batch request above (lockFootprint.estimateSize), does this also need to account for the supplied batch request?

Similar to the estimation above, I don't think we can account for replicated {,Reverse}ScanRequests. However, we should be able to consider replicated Get requests and write requests here.

@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 4a5a99c to 837c1e4 Compare December 12, 2024 19:45
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - I stripped out the two top commits and moved the insert count to the txnInterceptor

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


-- commits line 15 at r9:

Previously, arulajmani (Arul Ajmani) wrote…

nit: is this commit message stale? In the previous commit, we started tracking insertCount. Separately, s/tracing/tracking?

Commit removed / merged.


-- commits line 48 at r11:

Previously, arulajmani (Arul Ajmani) wrote…

Do we need an analog metric to TxnsInFlightLocksOverTrackingBudget as well? In particular, a metric + warning that accounts for replicated locks acquired by Scan/ReverseScan requests on the response path.

I added the log warning but not the metric. I'm not sure how useful the metric would be in practice since either we will stop using this txn in which case we didn't reject anything, or we will keep using it and the TxnsRejectedByLockCountLimit limit will be exceeded. That said, it wouldn't be hard to add the metric if you think it would be useful.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 818 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

would it be easier to grab the durability from here instead?

Yep - I missed that. Changed


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 841 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Should this be passing in the correct durability? Seems like it's plumbed through till here, we're just leaving it unused in this method.

Yes - changed


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 925 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: true /* isDurable */

Removed parameter


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 945 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: true /* isDurable */

We know that these are durable because they're part of the inflight write set.

Removed parameter


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 397 at r10 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Just like we estimate the size of the spans in the supplied batch request above (lockFootprint.estimateSize), does this also need to account for the supplied batch request?

Similar to the estimation above, I don't think we can account for replicated {,Reverse}ScanRequests. However, we should be able to consider replicated Get requests and write requests here.

After the change I didn't need to separate out the inflight from the locked spans, so this became simpler.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 44 at r8 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Is the comment here stale? In particular, the "number of bytes" portion.

Fixed/removed


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 46 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: let's expand this comment (in r2) to talk about durable locks (intents + other lock strengths) instead of just intents.

Fixed/removed.


pkg/kv/kvclient/kvcoord/condensable_span_set.go line 55 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: s/durable/isDurable/g

fixed/removed


pkg/kv/kvclient/kvcoord/condensable_span_set_test.go line 15 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Should we add/expand a test to cover both s.insert(true/false...) calls?

This is now stale since this was reverted.


pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go line 658 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

It feels a bit off that we're passing in true /* isDurable */ in refreshFootprint.insert. I think we've got the layering here incorrect -- maybe we should be tracking durable locks on the txnInterceptorPipeliner instead of this more general condensableSpanSet datastructure.

I agree - this made more sense when we were tracking bytes. With the simpler count design this doesn't need to touch the span at all.

@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 837c1e4 to 9eba1a6 Compare December 12, 2024 20:02
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


-- commits line 48 at r11:

Previously, andrewbaptist (Andrew Baptist) wrote…

I added the log warning but not the metric. I'm not sure how useful the metric would be in practice since either we will stop using this txn in which case we didn't reject anything, or we will keep using it and the TxnsRejectedByLockCountLimit limit will be exceeded. That said, it wouldn't be hard to add the metric if you think it would be useful.

I added the metric also. Let me know if you think it makes sense to leave in.

@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 9eba1a6 to 0e5b0e9 Compare December 17, 2024 19:40
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch :lgtm: . This ended up in a really nice place, so thanks for iterating on it.

Just a reminder to spruce up the testing with what you had in mind before merging. I can have another look if you want me to before we bors, but I'll stamp this.

Reviewed 5 of 8 files at r12, 2 of 2 files at r13, 1 of 4 files at r14, 3 of 3 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 2602 at r4 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Good point - I will add this, but can you review to make sure the other changes look OK first to make sure I'm testing the "right thing". Thanks!

Reminder to come back to this comment as this PR looks close.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 393 at r13 (raw file):

	if err := ba.LockSpanIterate(nil /* br */, func(sp roachpb.Span, durability lock.Durability) {
		spans = append(spans, sp)
		if durability == lock.Replicated {

From slack:

I didn’t do anything special for durable scan requests as for the estimate it is easiest to just count it as “1”. 

Could you leave a comment indicating that we're possibly underestimating scan/reverse scan requests (which intend to acquire durable locks) here by counting those as 1. We'll know for sure on the response path when we update tp.writeCount.

EDIT: I see you've got the comment in a subsequent commit.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 266 at r12 (raw file):

	lockFootprint condensableSpanSet

	// writeCount counts the number of replicated locks and intents to this

nit: "number of replicated lock acquisitions and intents written by this"


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 733 at r15 (raw file):

		tp.txnMetrics.TxnsInFlightLocksOverTrackingBudget.Inc(1)
	}
	if tp.writeCount > rejectTxnMaxCount {

nit: consider adding a comment such as "similar to the in-flight writes case above, we may have gone over the rejectTxnMaxCount threshold because we don't accurately estimate the number of ranged locking reads before sending the request".

Previously we had an ability to reject transactions that required too
much memory, however this was tied together with the condense limit. The
limits have too different purposes and should be set independently. The
condense size limit (kv.transaction.max_intents_bytes) is used to
protect the memory on the client side by using less precise tracking
once it passes a certain size. The new limit
`kv.transaction.max_intents_and_locks` is intended to protect the server
side by preventing a client from creating a transaction with too many
operations in it. Large transactions with many intents can have a
negative impact especially in a multi-tenant system.

Epic: none

Fixes: cockroachdb#135841

Release note (ops change): Adds a new configurable parameter
kv.transaction.max_intents_and_locks which will prevent
transactions from creating too many intents.
This commit adds a new metric txn.count_limit_rejected which tracks the
number of transactions that are rejected based on number of intents or
replicated locks in the transaction.

Epic: none

Release note (ops change): Add metric txn.count_limit_rejected.
This commit adds a new metric txn.count_limit_on_response which tracks the
number of transactions that exceeded the max transaction count on their
response.

Epic: none

Release note (ops change): Add metric txn.count_limit_on_response.
@andrewbaptist andrewbaptist force-pushed the 2024-11-21-intent-size-rejection branch from 0e5b0e9 to 7ef9f24 Compare December 18, 2024 13:45
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I added test cases to TestTxnPipelinerRejectAboveBudget to cover all the additional scenarios. I don't have a test that combines both types of rejections, but I don't think that is too important. I'm going to run one more full test on a cluster to make sure this works, but otherwise I think it should be ready if you want to take one more look, or just stamp it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 266 at r12 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: "number of replicated lock acquisitions and intents written by this"

Changed


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 393 at r13 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

From slack:

I didn’t do anything special for durable scan requests as for the estimate it is easiest to just count it as “1”. 

Could you leave a comment indicating that we're possibly underestimating scan/reverse scan requests (which intend to acquire durable locks) here by counting those as 1. We'll know for sure on the response path when we update tp.writeCount.

EDIT: I see you've got the comment in a subsequent commit.

Ack


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 733 at r15 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: consider adding a comment such as "similar to the in-flight writes case above, we may have gone over the rejectTxnMaxCount threshold because we don't accurately estimate the number of ranged locking reads before sending the request".

Added - good point.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 2602 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Reminder to come back to this comment as this PR looks close.

I now added all the relevant test cases to TestTxnPipelinerRejectAboveBudget. Let me know if you see anything else that seems worth testing. I don't explicitly test logging, but I did verify they appear in a manual test.

@andrewbaptist
Copy link
Collaborator Author

I ran a test on a cluster with the following:

Create a KV database with > 100K rows inserted.

> set cluster setting kv.transaction.max_intents_and_locks=100000;
> alter table kv add column x int;
> update kv set x = k+1 where 1=1 limit 90000;
UPDATE 90000

Time: 4.420s total (execution 4.420s / network 0.000s)

root@localhost:26257/kv> update kv set x = k+1 where 1=1 limit 110000;
ERROR: the transaction is locking too many rows and exceeded its lock-tracking memory budget; lock spans: 100963 bytes > budget: 100000 bytes. Request pushing transaction over the edge: 2026 Put. Transaction details: "sql txn" meta={id=bf1cc72f key=/Table/106/1 iso=Serializable pri=0.01623211 epo=0 ts=1734530872.390513763,2 min=1734530870.795431161,0 seq=98937} lock=true stat=PENDING rts=1734530870.795431161,0 wto=false gul=1734530871.295431161,0.
SQLSTATE: 53400

So everything looks good other than the message not being very accurate (mentioning bytes rather than rows).

@andrewbaptist andrewbaptist added the backport-24.3.x Flags PRs that need to be backported to 24.3 label Dec 18, 2024
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@andrewbaptist
Copy link
Collaborator Author

TFTR!

bors r=arulajmani

@craig craig bot merged commit f61084f into cockroachdb:master Dec 18, 2024
21 of 22 checks passed
Copy link

blathers-crl bot commented Dec 18, 2024

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #135841: branch-release-24.3.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@joshimhoff
Copy link
Collaborator

joshimhoff commented Dec 18, 2024

Thanks for getting this over the line! @andrewbaptist, re: backport, I forget: Does this logic live in kvserver or kvclient or a mix? Reminder that Serverless will have many SQL pods on 24.2 for many months, but the KV pods will be on 24.3 by end of this week.

@andrewbaptist
Copy link
Collaborator Author

@joshimhoff - This logic lives in kvclient, so it requires the pods to get to 24.3. Also unfortunately this missed the 24.3.1 cut so it will be in 24.3.2. I will add a backport to 24.2.x so that the next release will pick that up. Is there something that pushes SQL pods to the latest point release?

@andrewbaptist
Copy link
Collaborator Author

blathers backport 24.2

Copy link

blathers-crl bot commented Dec 18, 2024

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #135841: branch-release-24.2.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvclient: introduce an independent knob to reject large write transactions
4 participants