Replies: 3 comments
-
This is a very neat idea @waynexia -- indeed we have similar rewrite passes in IOx (that operate on It almost sounds to me like the (RFC) is implementing a "pull up" in that it is pulling the "get data from remote node" up in the plan (and thus pushing parts of the plan's computation down to the remote nodes. It might make sense to consider doing this optimization on the ExecutionPlan if you find you don't have enough information (like partitioning) to push down operations to the execution nodes |
Beta Was this translation helpful? Give feedback.
-
That's great 👍 I also find some similar logic in the existing codebase, the multi-phase aggregation you mentioned. The overall idea is the same: transform an operation into multiple steps where some can be executed in parallel.
Cool, that's right. I call it "push down" because it's easier to understand. The RFC and implementation are pulling the "merge node" up in actual. The "commutativity" part in RFC is defined on the "merge node" with all other plans. With it, we can know how to or if it is possible to exchange the "merge node" with its parent node. The more we pull the "merge node" up the later we have to collect and aggregate the results.
I go with logical plan with two points: (1) it's more generic, and doesn't contain some execution information (hence the rule might be more straightforward). (2) it can be processed earlier than the physical plan. In the logical planning phase, we might lack physical information like the partitioning you said. But others like how the data is distributed across nodes, how many nodes we will use are available. Those "logical" information might be enough for us to play with logical plan. And then we can leave physical planning and optimization to the node that will execute the sub-plan, where it has more execution information. One big part of this is we have to define a good and complete rule set of commutativity. I've made some but it's still very scarce. And another part is substrait, the proto to send sub-plan. I should admit it's not a good way compared with plain protobuf at the current stage, so many plan/exprs are absent and brings many limitations. |
Beta Was this translation helpful? Give feedback.
-
Loosely related to this, I propose that we create a |
Beta Was this translation helpful? Give feedback.
-
I broke this discussion out of #6782 (comment) as there is already a lot of discussion on that ticket and I thought putting this thread in a new discussion would make it easier to collaborate
@waynexia writes:
Beta Was this translation helpful? Give feedback.
All reactions