Exponential nature of ExecutionPlan
s output_partitioning
and equivalence_properties
#9084
Labels
bug
Something isn't working
Describe the bug
I have been troubleshooting the TPCH-DS query 64, and have found some performance issues, seemingly stemming from
ExecutionPlan::output_partitioning
,ExecutionPlan::equivalence_properties
and their inherent branching nature throughout variousExecutionPlan
implementations.In particular the problem manifests by the process pinning the CPU at 100%, and getting stuck in the
EnforceDistribution
physical optimizer, which aggravates the underlying problem by inter-leavingRepartitionExec
into the existing plan tree.To Reproduce
Setup
You can use the existing
tpcds_physical_q64
test, just un-ignore it and setenv_logger::init()
somewhere to capture the logs. You'll also need to setRUST_MIN_STACK=3000000
to alleviate an unrelated problem described in #4786.Flamegraph
Next you can optionally capture the flamegraph to get an initial sense of what's happening
I see something like flamegraph.svg.zip.
Notably, there is a large build-up of (likely under-sampled)
output_partitioning
andequivalence_properties
calls.Investigation
I used some counters
and then sprinkled
crate::OUT_PART_COUNTER.increment()
andcrate::EQ_PROP_COUNTER.increment()
at the begining offn output_partitioning(&self)
andfn equivalence_properties(&self)
respectively, forMemoryExec
,RepartitionExec
,HashJoinExec
,ProjectionExec
andAggregateExec
.I also added the following
Results
Soon enough after starting the test you'll notice these numbers reach values on the order of 100K and then millions and more as the slowdown in the iterations becomes appreciable
Explanation
WARNING: A lot of gritty details below
Taking a look at one of the plans that appears early in the optimization process:
and exploring the relevant implementations of
output_partitioning
andequivalence_properties
, you can see that they have the potential to branch off into calling two methods on the input, thus leading to a exponential call tree.In particular calling just
output_partitioning()
on the top-most plan leads to:AggregateExec::output_partitioning
ProjectionExec
node below this gets expanded toHashJoinExec
will expand these calls to left and right inputsRepartitionExec
s which are leaf call-nodes forout_part()
, but also expandeq_prop
into 2 of inputs methods we getMemoryExec
s which terminate theeq_prop()
call chain as wellSo for computing 1 thing (
Partitioning
), from a total of 7 plans the total invocation count for the above two methods was 31, thus with some overlapping. Note that some of these calls involve allocating stuff on the heap as well as some other computations, which can add up when the invocation count grows substantially.Finally, given that these 2 methods are liberally called inside
ensure_distribution
and its sub-routines, I think this explains the enormous call count and ultimately the slow-down.Expected behavior
Some potential mitigations:
output_partitioning
andequivalence_properties
within methods. For instance inAggregateExec
the second call toinput.output_partitioning
is redundant and the call toinput.equivalence_properties()
can also be delayed only in case of a match (and likewise forProjectionExec
).EnforceDistribution
interleaving some helper plans alongside withRepartitionExec
which serve to cache/short-circuit theoutput_partitioning
andequivalence_properties
already computed for the input below.Additional context
No response
The text was updated successfully, but these errors were encountered: