You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do. ProjectionExec can either have computations like (col1 + col2) or it can be used to reorder / rename the columns
The first use case benefits from repartitioning (as then the calculation can be done in multiple cores)
The second use case (ordering) does not benefit from partitioning as it is simply a bookkeeping arrangement
ProjectionExec: expr=[f@0 as f]
RepartitionExec: partitioning=RoundRobinBatch(4) <-- This repartition node is likely worthless
DeduplicateExec: [tag@1 ASC,time@2 ASC]
SortPreservingMergeExec: [tag@1 ASC,time@2 ASC]
UnionExec
Describe the solution you'd like
This I think ProjectionExec should only "benefit from partitioning" when its partition expressions actually have calculations (aka are not just columns / aliases)
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
ProjectionExec
can either have computations like (col1
+col2
) or it can be used to reorder / rename the columnsThe first use case benefits from repartitioning (as then the calculation can be done in multiple cores)
The second use case (ordering) does not benefit from partitioning as it is simply a bookkeeping arrangement
Basically we have a plan like
That is then optimized by https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_optimizer/repartition.rs to repartition before the projection
Describe the solution you'd like
This I think ProjectionExec should only "benefit from partitioning" when its partition expressions actually have calculations (aka are not just columns / aliases)
This would like defining
benefits_from_input_partitioning
https://github.com/apache/arrow-datafusion/blob/906896b7c59ff14d71b3056ec4349274cf6662af/datafusion/core/src/physical_plan/mod.rs#L176-L183
For
impl ExecutionPlan for ProjectionExec
: https://github.com/apache/arrow-datafusion/blob/906896b7c59ff14d71b3056ec4349274cf6662af/datafusion/core/src/physical_plan/projection.rs#L151So that it returned true only if there were expressions that had non column references / aliases
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
I think this is a good first issue as the code and desire is fairly straightforward and this would largely be an exercise in updating tests I suspect
The text was updated successfully, but these errors were encountered: