Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: closed session registry causes deadlock in connexecutor #83078

Closed
knz opened this issue Jun 19, 2022 · 0 comments · Fixed by #83083
Closed

sql: closed session registry causes deadlock in connexecutor #83078

knz opened this issue Jun 19, 2022 · 0 comments · Fixed by #83083
Labels
A-sql-execution Relating to SQL execution. A-sql-executor SQL txn logic C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. regression Regression from a release. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team

Comments

@knz
Copy link
Contributor

knz commented Jun 19, 2022

While working on an unrelated PR, I observed the following behavior: a query running a new statement (under development) did not complete, but there was also no error. In fact, the conn executor had deadlocked (stack trace below).

I troubleshooted this, and I can say for certainty that the closed session registry will always cause a deadlock if a panic is encountered during execution at a txn transition (savepoint, restart, commit etc).

Here is an example of what can happen internally:

  • say there's some bug in a statement, whereby some memory does not get released properly at the end of an (implicit) txn

  • this causes the Stop() method from the memory monitor at the end of the execution (finishSQLTxn()) to report "unexpected leftover bytes" as a panic

  • finishSQLTxn() is recursively called, through the FSM, from txnStateTransitionsApplyWrapper():

    ex.mu.Lock()
    errex.machine.ApplyWithPayload(withStatement(ex.Ctx(), ex.curStmtAST), ev, payload)
    ex.mu.Unlock()
  • however, meanwhile (*connexecutor) run() contains a defer with the call to .serialize(), which also contains:

    ex.mu.RLock()
    defer ex.mu.RUnlock()

So if a panic is encountered under ApplyWithPayload() (which is about every possible state change in the FSM), the .serialize() call at the end will deadlock.

What we want to do about this

Obviously a deadlock is not great!

We can either switch over the serialization to a non-locking operation,

or maybe use a defer unlock around ApplyWithPayload.

Example stack trace

Here is the stack trace:

goroutine 1097 [semacquire]:
sync.runtime_SemacquireMutex(0xc00297eae4, 0x0, 0x0)
        GOROOT/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).RLock(...)
        GOROOT/src/sync/rwmutex.go:63
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).serialize(0xc00297d900)
        github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:3028 +0xb3
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run.func1(0xc00297d900, {0x24c6da0, 0xc00193f940}, 0xc003dbbd80)
        github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1788 +0x85
panic({0x9987e0, 0xc0005a7248})
        GOROOT/src/runtime/panic.go:1038 +0x25b
github.com/cockroachdb/cockroach/pkg/util/log/logcrash.ReportOrPanic({0x24c6da0, 0xc002c14740}, 0xc00143c000, {0xdcabda, 0x20}, {0xc002e9b2d0, 0x2, 0x2})
        github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:378 +0x225
github.com/cockroachdb/cockroach/pkg/util/mon.(*BytesMonitor).doStop(0xc0007f6cc0, {0x24c6da0, 0xc002c14740}, 0x1)
        github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:426 +0x294
github.com/cockroachdb/cockroach/pkg/util/mon.(*BytesMonitor).Stop(...)
        github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:406
github.com/cockroachdb/cockroach/pkg/sql.(*txnState).finishSQLTxn(0xc00297d990)
        github.com/cockroachdb/cockroach/pkg/sql/txn_state.go:249 +0x5d
github.com/cockroachdb/cockroach/pkg/sql.cleanupAndFinishOnError({{0x24c6e48, 0xc001b67dd0}, {0x242d7e0, 0xc000bdb4e0}, {0xb07f40, 0xc00297d990}, {0x242d600, 0xc000c8daa0}, {0x79ca60, 0xc000c8dab0}})
        github.com/cockroachdb/cockroach/pkg/sql/conn_fsm.go:510 +0xf3
github.com/cockroachdb/cockroach/pkg/util/fsm.Transitions.apply({0xc000155ad0}, {{0x24c6e48, 0xc001b67dd0}, {0x242d7e0, 0xc000bdb4e0}, {0xb07f40, 0xc00297d990}, {0x242d600, 0xc000c8daa0}, {0x79ca60, ...}})
        github.com/cockroachdb/cockroach/pkg/util/fsm/fsm.go:106 +0x142
github.com/cockroachdb/cockroach/pkg/util/fsm.(*Machine).ApplyWithPayload(...)
        github.com/cockroachdb/cockroach/pkg/util/fsm/fsm.go:135
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).txnStateTransitionsApplyWrapper(0xc00297d900, {0x242d600, 0xc000c8daa0}, {0x79ca60, 0xc000c8dab0}, {0x83171f0a0, 0xc00107bbc0}, 0xc)
        github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2825 +0x298
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd(0xc00297d900)
        github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2048 +0x17e5
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc00297d900, {0x24c6da0, 0xc00193f940}, 0xc002112900, {0x5400, 0x15000, 0xc002113680, 0x0, 0x0}, 0xc000bffb60)
        github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1801 +0x27a
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn(0xc000ceb100, {0x24c6da0, 0xc00193f940}, {0xc00297d900}, {0x5400, 0x15000, 0xc002113680, 0x0, 0x0}, 0xc000bffb60)
        github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:750 +0xe5
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1(0xc003302a48, 0xc002c41530, {0x24c6da0, 0xc00193f940}, 0xc000bffb60, 0xc000ceb100, 0xc00107b500, {0x25613a8, 0xc00211a6c0}, 0xc0026a7920, ...)

cc @jordanlewis @maryliag for triage.

Jira issue: CRDB-16848

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-execution Relating to SQL execution. regression Regression from a release. A-sql-executor SQL txn logic T-sql-execution labels Jun 19, 2022
@blathers-crl blathers-crl bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team labels Jun 19, 2022
@craig craig bot closed this as completed in 76cbd88 Jun 22, 2022
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. A-sql-executor SQL txn logic C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. regression Regression from a release. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant