-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: pipeline replicated lock acquisition #121088
kv: pipeline replicated lock acquisition #121088
Conversation
f32e201
to
3fb68c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
pkg/sql/row/kv_batch_fetcher.go
line 636 at r2 (raw file):
// alive. // TODO(nvanbenschoten): explain why this was needed. if f.lockDurability != lock.Replicated {
nit: it might be a good idea to apply a similar change to txnKVStreamer.SetupNextFetch
and buildResumeSingleRangeBatch
in streamer.go
even if this is not needed at the moment (is it needed? we currently don't enable the streamer when non-default key locking is used - is this related?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I can have another look once we've added tests for this PR before stamping.
Reviewed 1 of 1 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 45 at r2 (raw file):
) var pipelinedRangedWritesEnabled = settings.RegisterBoolSetting(
Should we make these settings public so that there's documentation for them?
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 549 at r2 (raw file):
// writes that a ranged request will add. To our in-flight writes set. And // once we perform async consensus, we can't merge them away until we prove // that they have succeeded. What should we do?
Nice find! Do you think it's okay to let our lock tracking potentially exceed maxTrackingBytes
in such cases, as long as we have added observability to react?
Concretely, I was thinking we could add metrics/logs on the response path whenever tracking pipelined locks for a ranged request causes us to go over budget. If imprecisely predicting the memory usage for ranged requests ever becomes a problem, we can always reach for the cluster settings that disable lock pipelining for DeleteRange
/locking {,Reverse}Scan
requests.
Separately, I'm not sure what (if anything) we can do to better estimate how much our in-flight write set will grow by. I was looking at TargetBytes
which are set on {,Reverse}Scan
requests, but those include both the key and value portion.
pkg/kv/kvserver/txnrecovery/manager.go
line 210 at r2 (raw file):
}, Txn: meta, // TODO(nvanbenschoten): pass in the correct lock strength here.
This didn't last too long! 🔥
3fb68c8
to
1e53c75
Compare
SummaryFor a multi-region cluster running TPC-C under Read Committed, this change provides a 75% speedup for the For a single-region cluster running TPC-C under Read Committed, this change provides a 32% speedup for the Impact on multi-region TPC-CThis change will be most impactful for clusters that with cross-region synchronous replication, where each round of raft takes O(10ms). We can see a significant improvement in multi-region clusters on Read Committed workloads that use SELECT FOR UPDATE and/or perform foreign key checks. The following performance summary comes from a modified version of the Before
After
Impact on single-region TPC-CThis change will be most impactful for clusters that with cross-region replication. However, we do see an improvement in single-region clusters on Read Committed workloads that use SELECT FOR UPDATE and/or perform foreign key checks. This is because pipelining allows the disk writes for replicated locks to be parallelized. The following performance summary comes from the Before
After
With this change, the With this change, the With this change, the |
This avoids the problem described in the removed TODO. That hypothesized problem is real. Without this change, cockroachdb#121088 runs into trouble with the following sequence of operations: ```sql create table kv (k int primary key, v int); insert into kv values (1, 2); begin isolation level read committed; insert into kv values (2, 2); savepoint s1; insert into kv values (3, 2); rollback to s1; select * from kv where k = 1 for update; commit; ERROR: internal error: QueryIntent request for lock at sequence number 2 but sequence number is ignored [{2 2}] ``` Epic: None Release note: None
1e53c75
to
ef4c69b
Compare
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can have another look once we've added tests for this PR before stamping.
The testing should all be here now. I think we will want a few end-to-end tests around parallel commits as sanity checks, but these can come in a later PR. So this should be good for a full review.
I also found that our hypothesis about savepoint rollbacks not playing well with non-intent locks was correct — we'll want to land #121458 as well.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @yuzefovich)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 45 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we make these settings public so that there's documentation for them?
Done.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 549 at r2 (raw file):
Do you think it's okay to let our lock tracking potentially exceed
maxTrackingBytes
in such cases, as long as we have added observability to react?
Concretely, I was thinking we could add metrics/logs on the response path whenever tracking pipelined locks for a ranged request causes us to go over budget. If imprecisely predicting the memory usage for ranged requests ever becomes a problem, we can always reach for the cluster settings that disable lock pipelining for
DeleteRange
/locking{,Reverse}Scan
requests.
This all sounds reasonable to me. I opened #121471 to track this.
I'm not sure what (if anything) we can do to better estimate how much our in-flight write set will grow by. I was looking at
TargetBytes
which are set on{,Reverse}Scan
requests, but those include both the key and value portion.
I don't think we can do anything better without server-side changes to dynamically decide whether to respect AsyncConsensus
or not. And then to communicate this back to the client. Given the conversations we've had about the key and byte limits already in place, the escape hatches we have with the cluster setting, and some manual testing I ran on Friday, I'm ok with deferring this until we see it become a problem.
pkg/sql/row/kv_batch_fetcher.go
line 636 at r2 (raw file):
is it needed? we currently don't enable the streamer when non-default key locking is used - is this related
If we don't enable the streamer when non-default key locking is used, then this is not needed. It's only needed when lockDurability
== lock.Replicated
. I'll add an assertion in Streamer
so that we catch this if/when we decide to use it for locking reads.
ef4c69b
to
322a613
Compare
121458: kv: give savepoints distinct start and end sequence numbers r=miraradeva,arulajmani a=nvanbenschoten This commit increments a transaction's write sequence number on savepoint creation and rollback. This ensures that savepoints have distinct start and end sequence numbers, which is necessary distinguish between all operations (writes and locking reads) that happened before the savepoint creation, those that happened within the savepoint, and those that happened after the savepoint rollback. This avoids the problem described in the removed TODO. That hypothesized problem is real. Without this change, #121088 runs into trouble with the following sequence of operations: ```sql create table kv (k int primary key, v int); insert into kv values (1, 2); begin isolation level read committed; insert into kv values (2, 2); savepoint s1; insert into kv values (3, 2); rollback to s1; select * from kv where k = 1 for update; commit; ERROR: internal error: QueryIntent request for lock at sequence number 2 but sequence number is ignored [{2 2}] ``` Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
This commit increments a transaction's write sequence number on savepoint creation and rollback. This ensures that savepoints have distinct start and end sequence numbers, which is necessary distinguish between all operations (writes and locking reads) that happened before the savepoint creation, those that happened within the savepoint, and those that happened after the savepoint rollback. This avoids the problem described in the removed TODO. That hypothesized problem is real. Without this change, #121088 runs into trouble with the following sequence of operations: ```sql create table kv (k int primary key, v int); insert into kv values (1, 2); begin isolation level read committed; insert into kv values (2, 2); savepoint s1; insert into kv values (3, 2); rollback to s1; select * from kv where k = 1 for update; commit; ERROR: internal error: QueryIntent request for lock at sequence number 2 but sequence number is ignored [{2 2}] ``` Epic: None Release note: None
f346a0f
to
dec4830
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see these benchmark numbers 🔥
Reviewed 13 of 13 files at r7, 1 of 1 files at r8, 11 of 13 files at r9, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 549 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you think it's okay to let our lock tracking potentially exceed
maxTrackingBytes
in such cases, as long as we have added observability to react?Concretely, I was thinking we could add metrics/logs on the response path whenever tracking pipelined locks for a ranged request causes us to go over budget. If imprecisely predicting the memory usage for ranged requests ever becomes a problem, we can always reach for the cluster settings that disable lock pipelining for
DeleteRange
/locking{,Reverse}Scan
requests.This all sounds reasonable to me. I opened #121471 to track this.
I'm not sure what (if anything) we can do to better estimate how much our in-flight write set will grow by. I was looking at
TargetBytes
which are set on{,Reverse}Scan
requests, but those include both the key and value portion.I don't think we can do anything better without server-side changes to dynamically decide whether to respect
AsyncConsensus
or not. And then to communicate this back to the client. Given the conversations we've had about the key and byte limits already in place, the escape hatches we have with the cluster setting, and some manual testing I ran on Friday, I'm ok with deferring this until we see it become a problem.
All this sounds good to me. Thanks for opening the issue!
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 614 at r9 (raw file):
Txn: meta, Strength: w.Strength, // TODO: test, maybe extend TestTxnPipelinerSavepoints
Did you want to add your name to the TODO before merging? or were you meaning to address this before the PR merges?
pkg/sql/row/kv_batch_fetcher.go
line 635 at r9 (raw file):
// the underlying Get and Scan requests which could keep large byte slices // alive. // However, we do not re-use the requests slice if we're using the replicated
nit: new line before the new paragraph? 😂
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 679 at r9 (raw file):
require.Equal(t, expIfWrites, ifWrites) // Replicated locking read before write. Some existing in-flight replicated
Should we also add a ranged request here as well? Concretely, I was thinking we could lock more keys above and add a ScanRequest
that partially overlaps with some of the newly locked keys.
EDIT: I see you already have a ScanRequest
below; feel free to disregard.
This mirrors the ShallowCopy method on Request. Epic: None Release note: None
This commit steps a read committed transaction's read sequence after each statement retry. This ensures that the read sequence leads the ignored sequence number range established when the read committed statement savepoint was rolled back. Epic: None Release note: None
Fixes cockroachdb#117978. This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest. Release note: None
dec4830
to
af394f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @yuzefovich)
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 614 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Did you want to add your name to the TODO before merging? or were you meaning to address this before the PR merges?
Done.
pkg/sql/row/kv_batch_fetcher.go
line 635 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: new line before the new paragraph? 😂
Done.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go
line 679 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we also add a ranged request here as well? Concretely, I was thinking we could lock more keys above and add a
ScanRequest
that partially overlaps with some of the newly locked keys.EDIT: I see you already have a
ScanRequest
below; feel free to disregard.
I'll disregard then 😃
blathers backport release-24.1 |
…tent budget Since cockroachdb#121088, in-flight writes can include locking reads; because we don't estimate the size of the locks accurately for ranged locking reads, it is possible that in-flight writes exceed the max intent tracking budget (`kv.transaction.max_intents_bytes`). That's fine for now, but in this patch we add some observability to be aware of this happening. Fixes: cockroachdb#121471 Release note: None
122899: roachtest: add node-kill operation r=renatolabs a=itsbilal This change adds a node-kill operation with 4 variants: one that drains and one that doesn't x {SIGKILL, SIGTERM}. Epic: none Release note: None 122918: backupccl: prevent OR from linking in virtual file pointing to empty key space r=dt a=msbutler See individual commits. 123340: kvcoord: observability for in-flight writes and locking reads over intent budget r=miraradeva a=miraradeva Since #121088, in-flight writes can include locking reads; because we don't estimate the size of the locks accurately for ranged locking reads, it is possible that in-flight writes exceed the max intent tracking budget (`kv.transaction.max_intents_bytes`). That's fine for now, but in this patch we add some observability to be aware of this happening. I validated the new metric and log message by running: ``` CREATE TABLE t (k STRING PRIMARY KEY); INSERT INTO t VALUES (RPAD('a', pow(2, 21), 'a')); // 2MB INSERT INTO t VALUES (RPAD('b', pow(2, 21), 'b')); // 2MB INSERT INTO t VALUES (RPAD('c', pow(2, 21), 'c')); // 2MB BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; SELECT * FROM t FOR UPDATE LIMIT 5; // 6MB of locking reads, exceeding the limit of 4MB COMMIT; ``` <img width="964" alt="Screenshot 2024-04-30 at 8 50 09 PM" src="https://github.com/cockroachdb/cockroach/assets/127151398/5a62d9be-5c04-42c5-9708-4a867adc7135"> ``` W240501 19:46:39.838255 5527 kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:696 ⋮ [T2,Vdemoapp,n1,client=127.0.0.1:52087,hostssl,user=‹demo›] 731 a transaction's in-flight writes and locking reads have exceeded the intent tracking limit (kv.transaction.max_intents_bytes). in-flight writes and locking reads size: 6291483 bytes, txn: "sql txn" meta={id=a6c0b23a key=/Tenant/2/Table/112/1 iso=ReadCommitted pri=0.00869562 epo=0 ts=1714592799.819310000,0 min=1714592791.913332000,0 seq=0} lock=true stat=PENDING rts=1714592799.819310000,0 wto=false gul=1714592792.413332000,0, ba: ‹1 Scan› ``` Fixes: #121471 Release note: None Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Mira Radeva <[email protected]>
…tent budget Since #121088, in-flight writes can include locking reads; because we don't estimate the size of the locks accurately for ranged locking reads, it is possible that in-flight writes exceed the max intent tracking budget (`kv.transaction.max_intents_bytes`). That's fine for now, but in this patch we add some observability to be aware of this happening. Fixes: #121471 Release note: None
Fixes #117978.
Builds upon the foundation laid in #119975, #119933, #121065, and #121086.
This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through
Get
,Scan
, andReverseScan
requests now qualifies to be pipelined. ThetxnPipeliner
is updated to track the strength associated with each in-flight write and pass that along to the correspondingQueryIntentRequest
.See benchmark with TPC-C results here.
Release note: None