sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP #86324

ajwerner · 2022-08-17T17:33:04Z

Describe the problem

This is effectively the same problem as #61798. If an application drops a table and then continually attempts to access that table, the drop will not be able to make progress. This is classic live-lock due to mismatched priorities. This is because the code will attempt to perform a lease acquisition and that lease acquisition will push the writer attempting to drop the table. The lease acquisition will ultimately fail, but that's not relevant as the writer will have been pushed. In a cluster without high throughput, many nodes, and a long RTT between those nodes, it's probably the case that the drop will eventually find a moment to commit. It's geo-distributed clusters which will find this issue most problematic. As soon as the attempts to lease the descriptor stop, the drop will be able to proceed.

To Reproduce

Create a geo-distributed cluster with high latency. Run an application which attempts to access a table at high frequency. Drop that table. The drop will hang

Expected behavior

The drop should proceed.

Proposed solution

A possibility is to avoid initially using high priority when leasing and instead set a timeout on locks or something and then switch to high priority after an initial period that is longer than some multiple of the latency diameter of the cluster. Maybe 5s.

Another would be to stop doing this high priority dance in the first place.

Additional Context

There's some related discussion in #54633 and #46414.

Jira issue: CRDB-18697

gz#16237

ajwerner · 2022-08-23T14:34:46Z

The lease manager does, I believe, have enough information to avoid leasing this descriptor.

ajwerner · 2023-02-25T00:11:14Z

It turns out this is realistically exactly the same problem as #96840. I'm closing that issue as a duplicate but I encourage a reader to read that issue. It has good details.

ajwerner · 2023-02-25T00:14:45Z

The problem here ends up being the namespace lookups here. These are bad in part because they don't use a singleflight -- they should. Even if they did, we'd still have a tight loop resolving a name that does not exist with a high priority transaction; that will starve a writer. This comment is repeating (#96840 (comment))

The solution to starving the writer is a combination of The combination of #95225
and #95227. The proposed band-aid is to keep a cache of missing names and do some backoff when resolving them, gated on a cluster setting.

ajwerner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 17, 2022

blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Aug 17, 2022

stevendanna mentioned this issue Oct 13, 2022

backupccl: TestDataDriven/restore-grants is occasionally hanging #87129

Closed

postamar added the A-schema-catalog Related to the schema descriptors collection and the catalog API in general. label Nov 10, 2022

ajwerner changed the title ~~sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent dropping~~ sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP Feb 8, 2023

ajwerner mentioned this issue Feb 9, 2023

sql/catalog: cannot set a descriptor name to a name being queried #96840

Closed

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP #86324

sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP #86324

ajwerner commented Aug 17, 2022 •

edited by RoachietheSupportRoach

Loading

ajwerner commented Aug 23, 2022

ajwerner commented Feb 25, 2023

ajwerner commented Feb 25, 2023

sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP #86324

sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP #86324

Comments

ajwerner commented Aug 17, 2022 • edited by RoachietheSupportRoach Loading

ajwerner commented Aug 23, 2022

ajwerner commented Feb 25, 2023

ajwerner commented Feb 25, 2023

ajwerner commented Aug 17, 2022 •

edited by RoachietheSupportRoach

Loading