distsql: FlowScheduler.ScheduleFlow is source of significant mutex contention #50022
Labels
A-sql-execution
Relating to SQL execution.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
In an instance of TPC-E, I'm seeing that the mutex locking in
FlowScheduler.ScheduleFlow
here is the single largest source of mutex contention in the system, at a little over 15% of total mutex contention delay. This makes some sense, as TPC-E makes heavy use of DistSQL and this appears to be a serialization point between all DistSQL flows scheduled on a machine.I don't know this code, so I'm hoping to bring this to the attention of those that do (@asubiotto, @yuzefovich). Are there any easy wins here? Do we need to serialize the call to
Flow.Start
across all flows on a node? Does this call need to be protected by the mutex at all? If not, is the mutex only protectingfs.mu.numRunning
? Can we manipulate this counter using atomics to avoid blocking in the happy path wherefs.canRunFlow(f) == true
?The text was updated successfully, but these errors were encountered: