Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: pipeline replicated lock acquisition #121088

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Mar 26, 2024

Fixes #117978.

Builds upon the foundation laid in #119975, #119933, #121065, and #121086.

This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The txnPipeliner is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest.

See benchmark with TPC-C results here.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/replLockPipelining2 branch 6 times, most recently from f32e201 to 3fb68c8 Compare March 28, 2024 03:46
@nvanbenschoten nvanbenschoten marked this pull request as ready for review March 28, 2024 04:26
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner March 28, 2024 04:26
@nvanbenschoten nvanbenschoten requested a review from a team March 28, 2024 04:26
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner March 28, 2024 04:26
@nvanbenschoten nvanbenschoten requested review from yuzefovich and removed request for a team and yuzefovich March 28, 2024 04:26
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


pkg/sql/row/kv_batch_fetcher.go line 636 at r2 (raw file):

	// alive.
	// TODO(nvanbenschoten): explain why this was needed.
	if f.lockDurability != lock.Replicated {

nit: it might be a good idea to apply a similar change to txnKVStreamer.SetupNextFetch and buildResumeSingleRangeBatch in streamer.go even if this is not needed at the moment (is it needed? we currently don't enable the streamer when non-default key locking is used - is this related?)

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I can have another look once we've added tests for this PR before stamping.

Reviewed 1 of 1 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 45 at r2 (raw file):

)

var pipelinedRangedWritesEnabled = settings.RegisterBoolSetting(

Should we make these settings public so that there's documentation for them?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 549 at r2 (raw file):

		// writes that a ranged request will add. To our in-flight writes set. And
		// once we perform async consensus, we can't merge them away until we prove
		// that they have succeeded. What should we do?

Nice find! Do you think it's okay to let our lock tracking potentially exceed maxTrackingBytes in such cases, as long as we have added observability to react?

Concretely, I was thinking we could add metrics/logs on the response path whenever tracking pipelined locks for a ranged request causes us to go over budget. If imprecisely predicting the memory usage for ranged requests ever becomes a problem, we can always reach for the cluster settings that disable lock pipelining for DeleteRange/locking {,Reverse}Scan requests.

Separately, I'm not sure what (if anything) we can do to better estimate how much our in-flight write set will grow by. I was looking at TargetBytes which are set on {,Reverse}Scan requests, but those include both the key and value portion.


pkg/kv/kvserver/txnrecovery/manager.go line 210 at r2 (raw file):

			},
			Txn: meta,
			// TODO(nvanbenschoten): pass in the correct lock strength here.

This didn't last too long! 🔥

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/replLockPipelining2 branch from 3fb68c8 to 1e53c75 Compare March 29, 2024 20:34
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Mar 29, 2024
@nvanbenschoten
Copy link
Member Author

Summary

For a multi-region cluster running TPC-C under Read Committed, this change provides a 75% speedup for the delivery transaction and a 38% speedup the newOrder transaction.

For a single-region cluster running TPC-C under Read Committed, this change provides a 32% speedup for the delivery transaction, a 21% speedup for the newOrder transaction, and a 13% speedup for the payment transaction.


Impact on multi-region TPC-C

This change will be most impactful for clusters that with cross-region synchronous replication, where each round of raft takes O(10ms). We can see a significant improvement in multi-region clusters on Read Committed workloads that use SELECT FOR UPDATE and/or perform foreign key checks.

The following performance summary comes from a modified version of the tpcc/headroom/isolation-level=read-committed/n4cpu16 roachtest which replicates across three regions, uses region survivability, and pins all leases to the region with the workload. Of note are the p50 latencies for the delivery, newOrder, and payment transactions.

Before

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         131480           18.3    381.5    385.9    402.7    419.4   1208.0  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1315851          182.8    172.6    176.2    184.5    226.5    738.2  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         131514           18.3      4.1      4.1      5.2      6.6    226.5  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1315369          182.7     71.7     71.3     79.7    142.6    671.1  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         131600           18.3      9.9      8.9     13.1     41.9    218.1  stockLevel

After

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         131705           18.3     92.1     96.5    100.7    113.2    369.1  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1318307          183.1    109.4    109.1    113.2    159.4    486.5  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         131764           18.3      4.5      4.7      6.0      7.1    184.5  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1318058          183.1     72.0     71.3     75.5    134.2    469.8  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         131696           18.3     10.1      9.4     13.1     37.7    234.9  stockLevel

delivery performs 10 sequential single-row SELECT FOR UPDATE statements (selectNewOrder), so pipelining this replicated lock acquisition is a big deal. With this change, its p50 latency from 385.9ms to 96.5ms, a 75% improvement.

newOrder performs 1 multi-row SELECT FOR UPDATE statement (selectStock). It also performs 4 foreign key checks (1x in insertOrder, 1x in insertNewOrder, 2x in insertOrderLine). With this change, its p50 latency from 176.2ms to 109.1ms, a 38% improvement.

payment performs performs 2 foreign key checks (2x in insertHistory). We don't see a meaningful improvement for this test case, but do see one below.

Impact on single-region TPC-C

This change will be most impactful for clusters that with cross-region replication. However, we do see an improvement in single-region clusters on Read Committed workloads that use SELECT FOR UPDATE and/or perform foreign key checks. This is because pipelining allows the disk writes for replicated locks to be parallelized.

The following performance summary comes from the tpcc/headroom/isolation-level=read-committed/n4cpu16 roachtest. Of note are the p50 latencies for the delivery, newOrder, and payment transactions.

Before

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         132250           18.4     57.9     58.7     71.3     83.9    436.2  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1322298          183.7     25.5     25.2     31.5     41.9    318.8  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         132210           18.4      5.4      5.5      7.1      8.4    192.9  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1321391          183.5     12.5     12.6     15.7     22.0    251.7  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         132120           18.3     10.9     11.0     15.2     19.9     75.5  stockLevel

After

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         132228           18.4     38.9     39.8     48.2     54.5    251.7  delivery

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1322691          183.7     20.0     19.9     24.1     29.4    218.1  newOrder

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         132230           18.4      4.8      5.0      6.3      7.3    104.9  orderStatus

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0        1321782          183.6     11.0     11.0     13.6     16.8    209.7  payment

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 7200.0s        0         132061           18.3     10.0     10.0     14.2     17.8     54.5  stockLevel

With this change, the delivery transaction's p50 latency drops from 58.7ms to 39.8ms, a 32% speedup.

With this change, the newOrder transaction's p50 latency drops from 25.2ms to 19.9ms, a 21% speedup.

With this change, the payment transaction's p50 latency drops from 12.6ms to 11.0ms, a 13% speedup.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 1, 2024
This avoids the problem described in the removed TODO. That hypothesized
problem is real. Without this change, cockroachdb#121088 runs into trouble with the
following sequence of operations:
```sql
create table kv (k int primary key, v int);
insert into kv values (1, 2);

begin isolation level read committed;
insert into kv values (2, 2);
savepoint s1;
insert into kv values (3, 2);
rollback to s1;
select * from kv where k = 1 for update;
commit;

ERROR: internal error: QueryIntent request for lock at sequence number 2 but sequence number is ignored [{2 2}]
```

Epic: None
Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/replLockPipelining2 branch from 1e53c75 to ef4c69b Compare April 1, 2024 17:57
Copy link

blathers-crl bot commented Apr 1, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can have another look once we've added tests for this PR before stamping.

The testing should all be here now. I think we will want a few end-to-end tests around parallel commits as sanity checks, but these can come in a later PR. So this should be good for a full review.

I also found that our hypothesis about savepoint rollbacks not playing well with non-intent locks was correct — we'll want to land #121458 as well.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @yuzefovich)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 45 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Should we make these settings public so that there's documentation for them?

Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 549 at r2 (raw file):

Do you think it's okay to let our lock tracking potentially exceed maxTrackingBytes in such cases, as long as we have added observability to react?

Concretely, I was thinking we could add metrics/logs on the response path whenever tracking pipelined locks for a ranged request causes us to go over budget. If imprecisely predicting the memory usage for ranged requests ever becomes a problem, we can always reach for the cluster settings that disable lock pipelining for DeleteRange/locking {,Reverse}Scan requests.

This all sounds reasonable to me. I opened #121471 to track this.

I'm not sure what (if anything) we can do to better estimate how much our in-flight write set will grow by. I was looking at TargetBytes which are set on {,Reverse}Scan requests, but those include both the key and value portion.

I don't think we can do anything better without server-side changes to dynamically decide whether to respect AsyncConsensus or not. And then to communicate this back to the client. Given the conversations we've had about the key and byte limits already in place, the escape hatches we have with the cluster setting, and some manual testing I ran on Friday, I'm ok with deferring this until we see it become a problem.


pkg/sql/row/kv_batch_fetcher.go line 636 at r2 (raw file):

is it needed? we currently don't enable the streamer when non-default key locking is used - is this related

If we don't enable the streamer when non-default key locking is used, then this is not needed. It's only needed when lockDurability == lock.Replicated. I'll add an assertion in Streamer so that we catch this if/when we decide to use it for locking reads.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/replLockPipelining2 branch from ef4c69b to 322a613 Compare April 1, 2024 20:03
@nvanbenschoten nvanbenschoten added the branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 label Apr 1, 2024
craig bot pushed a commit that referenced this pull request Apr 1, 2024
121458: kv: give savepoints distinct start and end sequence numbers r=miraradeva,arulajmani a=nvanbenschoten

This commit increments a transaction's write sequence number on savepoint creation and rollback. This ensures that savepoints have distinct start and end sequence numbers, which is necessary distinguish between all operations (writes and locking reads) that happened before the savepoint creation, those that happened within the savepoint, and those that happened after the savepoint rollback.

This avoids the problem described in the removed TODO. That hypothesized problem is real. Without this change, #121088 runs into trouble with the following sequence of operations:
```sql
create table kv (k int primary key, v int);
insert into kv values (1, 2);

begin isolation level read committed;
insert into kv values (2, 2);
savepoint s1;
insert into kv values (3, 2);
rollback to s1;
select * from kv where k = 1 for update;
commit;

ERROR: internal error: QueryIntent request for lock at sequence number 2 but sequence number is ignored [{2 2}]
```

Epic: None
Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Apr 1, 2024
This commit increments a transaction's write sequence number on
savepoint creation and rollback. This ensures that savepoints have
distinct start and end sequence numbers, which is necessary distinguish
between all operations (writes and locking reads) that happened before
the savepoint creation, those that happened within the savepoint, and
those that happened after the savepoint rollback.

This avoids the problem described in the removed TODO. That hypothesized
problem is real. Without this change, #121088 runs into trouble with the
following sequence of operations:
```sql
create table kv (k int primary key, v int);
insert into kv values (1, 2);

begin isolation level read committed;
insert into kv values (2, 2);
savepoint s1;
insert into kv values (3, 2);
rollback to s1;
select * from kv where k = 1 for update;
commit;

ERROR: internal error: QueryIntent request for lock at sequence number 2 but sequence number is ignored [{2 2}]
```

Epic: None
Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/replLockPipelining2 branch 2 times, most recently from f346a0f to dec4830 Compare April 2, 2024 19:30
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

It's nice to see these benchmark numbers 🔥

Reviewed 13 of 13 files at r7, 1 of 1 files at r8, 11 of 13 files at r9, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 549 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you think it's okay to let our lock tracking potentially exceed maxTrackingBytes in such cases, as long as we have added observability to react?

Concretely, I was thinking we could add metrics/logs on the response path whenever tracking pipelined locks for a ranged request causes us to go over budget. If imprecisely predicting the memory usage for ranged requests ever becomes a problem, we can always reach for the cluster settings that disable lock pipelining for DeleteRange/locking {,Reverse}Scan requests.

This all sounds reasonable to me. I opened #121471 to track this.

I'm not sure what (if anything) we can do to better estimate how much our in-flight write set will grow by. I was looking at TargetBytes which are set on {,Reverse}Scan requests, but those include both the key and value portion.

I don't think we can do anything better without server-side changes to dynamically decide whether to respect AsyncConsensus or not. And then to communicate this back to the client. Given the conversations we've had about the key and byte limits already in place, the escape hatches we have with the cluster setting, and some manual testing I ran on Friday, I'm ok with deferring this until we see it become a problem.

All this sounds good to me. Thanks for opening the issue!


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 614 at r9 (raw file):

						Txn:      meta,
						Strength: w.Strength,
						// TODO: test, maybe extend TestTxnPipelinerSavepoints

Did you want to add your name to the TODO before merging? or were you meaning to address this before the PR merges?


pkg/sql/row/kv_batch_fetcher.go line 635 at r9 (raw file):

	// the underlying Get and Scan requests which could keep large byte slices
	// alive.
	// However, we do not re-use the requests slice if we're using the replicated

nit: new line before the new paragraph? 😂


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 679 at r9 (raw file):

	require.Equal(t, expIfWrites, ifWrites)

	// Replicated locking read before write. Some existing in-flight replicated

Should we also add a ranged request here as well? Concretely, I was thinking we could lock more keys above and add a ScanRequest that partially overlaps with some of the newly locked keys.

EDIT: I see you already have a ScanRequest below; feel free to disregard.

This mirrors the ShallowCopy method on Request.

Epic: None
Release note: None
This commit steps a read committed transaction's read sequence after
each statement retry. This ensures that the read sequence leads the
ignored sequence number range established when the read committed
statement savepoint was rolled back.

Epic: None
Release note: None
Fixes cockroachdb#117978.

This commit completes the client-side handling of replicated lock acquisition
pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan
requests now qualifies to be pipelined. The `txnPipeliner` is updated to track
the strength associated with each in-flight write and pass that along to the
corresponding QueryIntentRequest.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/replLockPipelining2 branch from dec4830 to af394f7 Compare April 3, 2024 14:27
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @yuzefovich)


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 614 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Did you want to add your name to the TODO before merging? or were you meaning to address this before the PR merges?

Done.


pkg/sql/row/kv_batch_fetcher.go line 635 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: new line before the new paragraph? 😂

Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 679 at r9 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Should we also add a ranged request here as well? Concretely, I was thinking we could lock more keys above and add a ScanRequest that partially overlaps with some of the newly locked keys.

EDIT: I see you already have a ScanRequest below; feel free to disregard.

I'll disregard then 😃

@craig craig bot merged commit 6143a7a into cockroachdb:master Apr 3, 2024
21 of 22 checks passed
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 3, 2024
@nvanbenschoten
Copy link
Member Author

blathers backport release-24.1

@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/replLockPipelining2 branch April 4, 2024 16:35
miraradeva added a commit to miraradeva/cockroach that referenced this pull request May 1, 2024
…tent budget

Since cockroachdb#121088, in-flight writes can include locking reads; because we
don't estimate the size of the locks accurately for ranged locking
reads, it is possible that in-flight writes exceed the max intent
tracking budget (`kv.transaction.max_intents_bytes`). That's fine for
now, but in this patch we add some observability to be aware of this
happening.

Fixes: cockroachdb#121471

Release note: None
craig bot pushed a commit that referenced this pull request May 1, 2024
122899: roachtest: add node-kill operation r=renatolabs a=itsbilal

This change adds a node-kill operation with 4 variants: one that drains and one that doesn't x {SIGKILL, SIGTERM}.

Epic: none

Release note: None

122918: backupccl: prevent OR from linking in virtual file pointing to empty key space r=dt a=msbutler

See individual commits.

123340: kvcoord: observability for in-flight writes and locking reads over intent budget r=miraradeva a=miraradeva

Since #121088, in-flight writes can include locking reads; because we don't estimate the size of the locks accurately for ranged locking reads, it is possible that in-flight writes exceed the max intent tracking budget (`kv.transaction.max_intents_bytes`). That's fine for now, but in this patch we add some observability to be aware of this happening.

I validated the new metric and log message by running:

```
CREATE TABLE t (k STRING PRIMARY KEY);
INSERT INTO t VALUES (RPAD('a', pow(2, 21), 'a')); // 2MB
INSERT INTO t VALUES (RPAD('b', pow(2, 21), 'b')); // 2MB
INSERT INTO t VALUES (RPAD('c', pow(2, 21), 'c')); // 2MB
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT * FROM t FOR UPDATE LIMIT 5; // 6MB of locking reads, exceeding the limit of 4MB
COMMIT;
```

<img width="964" alt="Screenshot 2024-04-30 at 8 50 09 PM" src="https://github.com/cockroachdb/cockroach/assets/127151398/5a62d9be-5c04-42c5-9708-4a867adc7135">

```
W240501 19:46:39.838255 5527 kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:696 ⋮ [T2,Vdemoapp,n1,client=127.0.0.1:52087,hostssl,user=‹demo›] 731  a transaction's in-flight writes and locking reads have exceeded the intent tracking limit (kv.transaction.max_intents_bytes). in-flight writes and locking reads size: 6291483 bytes, txn: "sql txn" meta={id=a6c0b23a key=/Tenant/2/Table/112/1 iso=ReadCommitted pri=0.00869562 epo=0 ts=1714592799.819310000,0 min=1714592791.913332000,0 seq=0} lock=true stat=PENDING rts=1714592799.819310000,0 wto=false gul=1714592792.413332000,0, ba: ‹1 Scan›
```

Fixes: #121471

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Mira Radeva <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request May 1, 2024
…tent budget

Since #121088, in-flight writes can include locking reads; because we
don't estimate the size of the locks accurately for ranged locking
reads, it is possible that in-flight writes exceed the max intent
tracking budget (`kv.transaction.max_intents_bytes`). That's fine for
now, but in this patch we add some observability to be aware of this
happening.

Fixes: #121471

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: pipeline replicated lock acquisition
4 participants