-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql,admission: ERROR: liveness session expired XXXms before transaction #78691
Comments
I think there's two questions here:
I feel like it's possible that just kv server rate limiting is starving off session renewal, given the shape of the test. As for the error, we probably should treat is as a restart internally like we do with failing to meet a transaction deadline. |
I hit this in another simpler test with just 1 node in the storage tier, in this case I wasn't getting any table statistics so I looked at error from:
Then I noticed my kv node log was full of:
So I added two more storage nodes and now its happy. So current hypothesis is lease errors occur due to under provisioned storage tier that's in an unhappy state. Should I close this or do we care about the ability to function in this situation? |
If you pause in your debugger for too long, I'd expect this. |
I think we close this, I'm not sure there's much to do here. The second one is #87452. |
Re-enabled these tests which started tripping again in #97448. Re-opening to track:
And also maybe using higher AC priority for session renewal goroutines, which would help with:
|
Actually the higher pri won’t help, since session renewal is bypassing AC altogether:
That uses cockroach/pkg/kv/kvserver/kvadmission/kvadmission.go Lines 210 to 212 in 736a67e
|
Actually, I’m being daft. This is a multi-tenant test, so we’re actually not bypassing AC. We’re just using regular priority then for SQL session extensions. cockroach/pkg/kv/kvserver/kvadmission/kvadmission.go Lines 203 to 210 in 736a67e
I'll go back to explicitly using high pri first. |
Fixes cockroachdb#97448 (possibly). Fixes cockroachdb#78691. These tests run under severe CPU overload, and we see the workload getting observing the following errors: ERROR: liveness session expired 571.043163ms before transaction The SQL liveness lease extension work ends up getting severely starved, despite extending leases by 40s every 5s. It turns out for tenant SQL liveness work, we were using admissionpb.NormalPri, so such starvation was possible. This wasn't true for the system tenant where we bypassed AC altogether. Release note: None
98207: sql: add REPLICATION and MANAGETENANT system privileges r=msbutler a=stevendanna This adds two new system privileges: - `REPLICATION`: Allows the user to call the internal functions that produce a cross-cluster replication stream. - `MANAGETENANT`: Allows the user to create and manage tenants. A user with the MANAGETENANT privileges is now able to execute the following statements: - SHOW TENANT - SHOW TENANTS - CREATE TENANT - CREATE TENANT FROM REPLICATION STREAM - DROP TENANT (if it is part of an active stream) - ALTER TENANT A user with the REPLICATION privileges is able to call the following functions: - crdb_internal.start_replication_stream - crdb_internal.replication_stream_progress - crdb_internal.stream_partition - crdb_internal.replication_stream_spec - crdb_internal.complete_replication_stream Fixes #95425 Release note: None 98785: roachtest: de-flake multitenant-fairness/read-heavy/skewed r=irfansharif a=irfansharif Fixes #97448 (possibly). Fixes #78691. These tests run under severe CPU overload, and we see the workload getting observing the following errors: ERROR: liveness session expired 571.043163ms before transaction The SQL liveness lease extension work ends up getting severely starved, despite extending leases by 40s every 5s. It turns out for tenant SQL liveness work, we were using admissionpb.NormalPri, so such starvation was possible. This wasn't true for the system tenant where we bypassed AC altogether. Release note: None 98821: sql: TestRandomSyntaxGeneration fixes r=cucaroach a=cucaroach ### sql: fix some TestRandomSyntaxGeneration bugs The RSG works by calling format on the AST's it generates so its good at finding Format bugs. Fix a missing separator in ShowBackupOptions. Example: ``` SHOW BACKUP 'family' IN ('string', 'placeholder', 'placeholder', 'placeholder', 'string', 'placeholder', 'string', 'placeholder') WITH incremental_location = 'nullif', privilegesdebug_dump_metadata_sst ``` Fix bad construction in ShowTenant. Example: ``` SHOW TENANT [B'10010'] WITH REPLICATION STATUS WITH CAPABILITIES ``` Epic: none Release note: None ### copy: fix copy grammar to match PG Previously COPY would allow a wide range of syntax in the COPY TO substatement. Now like PG we limit it to a few things. PG grammar is: ``` PreparableStmt: SelectStmt | InsertStmt | UpdateStmt | DeleteStmt | MergeStmt ``` And now we do something similar. This prevents the wheels from coming off when RSG generates EXPLAIN's in the substatement for instance. Release note: none Epic: none 99109: kvserver: fortify TestReplicaClosedTimestamp r=erikgrinaker a=tbg This test was flaky until ~Feb 2nd. This has since resolved, likely as a result of some other change, however there's an easy way to make the test a bit more resilient by widening a critical section. Closes #93864 Epic: none Release note: None 99122: ccl/multiregionccl: skip flaky secondary_region test r=matthewtodd a=matthewtodd Part of #92235. Part of #98020. It [flaked][1] this morning, after last night's [other skip][2] landed. [1]: #98020 (comment) [2]: #99031 Release note: None Co-authored-by: Steven Danna <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Matthew Todd <[email protected]>
Multiple tenants hitting a kv server with a large batched insert load can cause this error. For example see:
#77481
Changing warmup batch size from 100 to 1000 reliably demonstrates the issue.
Jira issue: CRDB-14219
The text was updated successfully, but these errors were encountered: