Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sql,persistedsqlstats: prevent a deadlock during shutdown
Prior to this change, the coordination between the stats flusher task (an async stopper task) and the activity flusher job was performed using a two-step process: - the stats persistence task offered to call a callback _function_ every time a flush would complete. - the job would _reconfigure the callback function_ on each iteration. - the function was writing to a channel that was subsequently read by the job iteration body. This approach was defective in 3 ways: 1. if the job iteration body would exit (e.g. due to a server drain) *after* it installed the callback fn, but *before* the stats flusher would read and call the callback fn, a window of time existed where a deadlock could occur: - the stats flusher retrieves the pointer to the caller fn but doesn't call it yet. - the job loop exits. From then on it will not read from the channel any more. - the stats flusher attempts to write to the channel. A deadlock occurs. (This was seen during testing. See cockroachdb#102574) The fix here is to always jointly `select` the write to the channel and also a read from the drain/stopper signals, to abort the channel operation if a shutdown is requested. 2. the stats flusher task was holding the mutex locked while performing the channel write. This is generally bad code hygiene as it forces the code maintainer to double-check whether the lock and channel operations don't mutually interlock. The fix is to use the mutex to retrieve the channel reference, and then write to the channel while the mutex is not held any more. 3. the API between the two was defining a *callback function* where really just a notification channel was needed. The fix here is to simplify the API. Release note: None
- Loading branch information