Prevent VStreamer engine deadlocks during state transitions #11268
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The VStreamer engine is somewhat unusual in two ways:
Because of this, when a tablet has open vstreams (direct binary log streams) performing work which require RPCs (such as when handling VSchema updates), and a state transition starts, it can deadlock between the VStreamerEngine mutex, the UVStreamer mutex, and the TabletManager mutex when checking if the engine is open or not as part of the TabletManager's
ChangeType
RPC call. More specifically, the deadlock seems to be (still trying to figure this out) this as seen using the repro test here:ChangeType
RPC call. It blocks on the VStreamerEngine mutex when opening (or closing) the engine.ApplyVSchema
RPC calls because of VSchema changes. While it's broadcasting these changes the UVStreamer lock is held. It then blocks on the TabletManager’s (RPC) mutex??? Another key factor here is that we can block on the vschema channel when handling the vschema changes. Another workaround was to increase the message buffering well beyond 1 for that channel.The blocking factors involved are:
Related Issue(s)
Checklist