sql/catalog/lease: high priority reads in lease acquisition of DROP descs may prevent RENAME/DROP #86324
Labels
A-schema-catalog
Related to the schema descriptors collection and the catalog API in general.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Describe the problem
This is effectively the same problem as #61798. If an application drops a table and then continually attempts to access that table, the drop will not be able to make progress. This is classic live-lock due to mismatched priorities. This is because the code will attempt to perform a lease acquisition and that lease acquisition will push the writer attempting to drop the table. The lease acquisition will ultimately fail, but that's not relevant as the writer will have been pushed. In a cluster without high throughput, many nodes, and a long RTT between those nodes, it's probably the case that the drop will eventually find a moment to commit. It's geo-distributed clusters which will find this issue most problematic. As soon as the attempts to lease the descriptor stop, the drop will be able to proceed.
To Reproduce
Create a geo-distributed cluster with high latency. Run an application which attempts to access a table at high frequency. Drop that table. The drop will hang
Expected behavior
The drop should proceed.
Proposed solution
A possibility is to avoid initially using high priority when leasing and instead set a timeout on locks or something and then switch to high priority after an initial period that is longer than some multiple of the latency diameter of the cluster. Maybe 5s.
Another would be to stop doing this high priority dance in the first place.
Additional Context
There's some related discussion in #54633 and #46414.
Jira issue: CRDB-18697
gz#16237
The text was updated successfully, but these errors were encountered: