sql: closed session registry causes deadlock in connexecutor #83078
Labels
A-sql-execution
Relating to SQL execution.
A-sql-executor
SQL txn logic
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
regression
Regression from a release.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
T-sql-queries
SQL Queries Team
While working on an unrelated PR, I observed the following behavior: a query running a new statement (under development) did not complete, but there was also no error. In fact, the conn executor had deadlocked (stack trace below).
I troubleshooted this, and I can say for certainty that the closed session registry will always cause a deadlock if a panic is encountered during execution at a txn transition (savepoint, restart, commit etc).
Here is an example of what can happen internally:
say there's some bug in a statement, whereby some memory does not get released properly at the end of an (implicit) txn
this causes the
Stop()
method from the memory monitor at the end of the execution (finishSQLTxn()
) to report "unexpected leftover bytes" as a panicfinishSQLTxn()
is recursively called, through the FSM, fromtxnStateTransitionsApplyWrapper()
:however, meanwhile
(*connexecutor) run()
contains adefer
with the call to.serialize()
, which also contains:So if a panic is encountered under
ApplyWithPayload()
(which is about every possible state change in the FSM), the.serialize()
call at the end will deadlock.What we want to do about this
Obviously a deadlock is not great!
We can either switch over the serialization to a non-locking operation,
or maybe use a
defer unlock
around ApplyWithPayload.Example stack trace
Here is the stack trace:
cc @jordanlewis @maryliag for triage.
Jira issue: CRDB-16848
The text was updated successfully, but these errors were encountered: