Get rid of special TreeNodes #8663

peter-toth · 2023-12-27T14:08:57Z

Is your feature request related to a problem or challenge?

Currently there are many special TreeNode trees in DataFusion so as to carry additional information needed for tree transformations. These special trees are a bit cumbersome as they need to implement TreeNode functions (apply_children(), map_children()).

Describe the solution you'd like

I'm suggesting to add TreeNode.transform_with_payload(), TreeNode.transform_down_with_payload() and TreeNode.transform_up_with_payload() to propagate down/up additional information during TreeNode transformation.

Please see 4. in #7942 and its POC implementation.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

peter-toth · 2023-12-27T14:09:21Z

Let me open a PR once #8653 landed.

alamb · 2023-12-27T15:22:36Z

I'm suggesting to add TreeNode.transform_with_payload(), TreeNode.transform_down_with_payload() and TreeNode.transform_up_with_payload() to propagate up additional information during TreeNode transformation.

Another common pattern in Rust is to capturing state in a closure rather than an explict payload -- perhaps that idea is also worth considering

berkaysynnada · 2023-12-28T12:19:05Z

We have also observed the complex API and different implementation approaches of TreeNode. I'm not sure if this is the right place to share these but, let me share our findings and a simple POC of how we can have a new trait providing a simpler and more understandable API:

Since the general approach in Datafusion tree structs involves having the children nodes within the self node, a pure transform_down cannot be performed. Even if the nodes are applied by the given op() in pre-order, the build order is always bottom-up. Inserting some logic into that build process (into map_children()) can introduce hidden transform-up functionality. For transform-up()s, the same flexibility may also cause the spread of a rule's logic into different parts of the code, going against code integrity.

There are also some unused functionalities and outcome types in general, so I think that a more simple API is enough usually. I scribbled a little and came up with something like this, sharing it here in case it's useful.

trait TreeNode_v2: Sized {
    fn children<'a>(&mut self) -> &'a mut [Self];

    fn transform<F>(&mut self, op_up: &mut F, op_down: &mut F) -> Result<()>
    where
        F: FnMut(&mut Self) -> Result<()>,
    {
        op_down(self)?;

        let children_as_mut_ref = self.children();
        if !children_as_mut_ref.is_empty() {
            children_as_mut_ref
                .iter_mut()
                .map(|c| c.transform(op_up, op_down))
                .collect::<Result<_>>()?;
        }

        op_up(self)?;
        Ok(())
    }
}

op_up and op_down may also be separated into 2 different functions. But the key concept is that the user only must implement how the node reaches its children, and the corresponding rule logic to apply while traversing the tree. It is also important that the whole tree must be created before the traversals with default payloads. Before the refactor, there were some places making on-the-fly children construction during the traversal.

I don’t know yet how this design can be integrated with the existing TreeNode, but in many places, such a design makes things easier and more understandable.

peter-toth · 2023-12-28T14:18:46Z

I think think this is partly the same idea I was suggesting in 1. of #7942. The main point there was that the closures change from FnMut(Self) -> Result<...> to the self mutating FnMut(&mut Self) -> Result<...>.
(Please note that the TreeNodeTransformer trait is not important in my PR. I just added there it to make transform() similar to the existing rewrite(). rewrite() uses a TreeNodeRewriter that has a pre-order (pre_visit()) and post-order like (mutate()) methods collected into one object, but I too would prefer separate f_down and f_up closures. Actually the new transform_with_payload() in #8664 follows that idea.)

Where I see some differences is the return type of our closures. Your code snipett shows Result<()> while I was suggesting Result<TreeNodeRecursion>. This is because I wanted to deprecate the current rewrite() / TreeNodeRewriter and replace/combine that with a new transform() method. If we want to do that then the new closures need to be able to prune the tree traversal (skip a subtree) and I don't think we can do it with a simple Result<()>.
(Actaully as the same pruning capability is needed for the TreeNode.apply() / TreeNode.visit() as well so I poposed to have a common TreeNodeRecursion enum for visit and transform functions. This is 2. in #7942)

So basically here is the transform() I suggested:

trait TreeNode_v2: Sized {
    fn transform_children<F>(&mut self, f: &mut F) -> Result<TreeNodeRecursion>
    where
        F: FnMut(&mut Self) -> Result<TreeNodeRecursion>;

    fn transform(&mut self, f_down: &mut F, f_up: &mut F) -> Result<TreeNodeRecursion>
    where 
        F: FnMut(&mut Self) -> Result<TreeNodeRecursion>,
     {
        // Apply `f_down` on self.
        f_down(self)?
            // If it returns continue (not prune or stop or stop all) then continue
            // traversal on inner children and children.
            .and_then_on_continue(||
                // Run the recursive `transform` on each children.
                self
                    .transform_children(&mut |c| c.transform(f_down, f_up))?
                    // Apply `f_up` on new self.
                    .and_then_on_continue(|| f_up(self)))?
            // Applying `f_down` or `f_up` on self might have returned prune, but we 
            // need to propagate continue.
            .continue_on_prune()
    }
}

Regarding

the key concept is that the user only must implement how the node reaches its children

I totally aggree.
But my question about fn children<'a>(&mut self) -> &'a mut [Self]; is that can we return children as &mut [Self] without collecting them into a temporary collection like Vec? Currently Expr.map_children() doesn't collect the children into a temporary collection so if we start doing it then we might introduce some performance regression.
BTW I was trying to follow a very similar approach initially. My initial children() returned an impl Iterator<Item=&mut Self> so as to avoid collecting childrens. But I simply couldn't implement it for Expr without returning a dynamic dispatch iterator so I remained at the above transform_children() that is similar to the current map_children().

Lastly, in this particulat issue I wanted to see if there is a way to get rid of those special trees. It seems unecessary to create those thees and I found it very hard to follow how additonal payload is propagated down/up. In #8664 I removed SortPushDown and PlanWithRequitements and ExprOrdering with the help of transform_down_with_payload() and transform_up_with_payload() functions and I think the new f_down / f_up closures are much simpler to understand now.
But, I'm open for other proposals like @alamb's #8663 (comment) although I'm not sure yet if capturing the state in the closures make things easier.

berkaysynnada · 2023-12-29T07:17:28Z

Thanks @peter-toth for your detailed explanation ☺️

Where I see some differences is the return type of our closures. Your code snipett shows Result<()> while I was suggesting Result<TreeNodeRecursion>. This is because I wanted to deprecate the current rewrite() / TreeNodeRewriter and replace/combine that with a new transform() method. If we want to do that then the new closures need to be able to prune the tree traversal (skip a subtree) and I don't think we can do it with a simple Result<()>. (Actaully as the same pruning capability is needed for the TreeNode.apply() / TreeNode.visit() as well so I poposed to have a common TreeNodeRecursion enum for visit and transform functions. This is 2. in #7942)

I hadn't thought deeply about the return type, but your suggestion makes a lot of sense.

But my question about fn children<'a>(&mut self) -> &'a mut [Self]; is that can we return children as &mut[Self] without collecting them into a temporary collection like Vec? Currently Expr.map_children() doesn't collect the children into a temporary collection so if we start doing it then we might introduce some performance regression.

I wasn't aware of Expr specifically, but I considered this approach because collection is already performed in all rules on the physical plan side.

I will review #8664 and write my new thoughts there. I believe we will find the most appropriate solution 🚀

alamb · 2024-04-16T12:28:46Z

Is this ticket still tracking anything I wonder (or is it now covered by #8913 🤔 )

peter-toth · 2024-04-16T12:33:29Z

I think we can close this issue as #8817 reduced the number of TreeNode types by introducing PlanContext and ExprContext.

alamb · 2024-04-16T15:51:55Z

Thanks again @berkaysynnada and @peter-toth -- the TreeNode API is looking quite good these days

peter-toth added the enhancement New feature or request label Dec 27, 2023

This was referenced Dec 27, 2023

Transform with payload #8664

Closed

Refactor TreeNode recursions #7942

Closed

peter-toth mentioned this issue Dec 29, 2023

Cleanup TreeNode implementations #8672

Merged

alamb closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of special TreeNodes #8663

Get rid of special TreeNodes #8663

peter-toth commented Dec 27, 2023 •

edited

Loading

peter-toth commented Dec 27, 2023 •

edited

Loading

alamb commented Dec 27, 2023

berkaysynnada commented Dec 28, 2023

peter-toth commented Dec 28, 2023 •

edited

Loading

berkaysynnada commented Dec 29, 2023

alamb commented Apr 16, 2024

peter-toth commented Apr 16, 2024 •

edited

Loading

alamb commented Apr 16, 2024

Get rid of special TreeNodes #8663

Get rid of special TreeNodes #8663

Comments

peter-toth commented Dec 27, 2023 • edited Loading

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

peter-toth commented Dec 27, 2023 • edited Loading

alamb commented Dec 27, 2023

berkaysynnada commented Dec 28, 2023

peter-toth commented Dec 28, 2023 • edited Loading

berkaysynnada commented Dec 29, 2023

alamb commented Apr 16, 2024

peter-toth commented Apr 16, 2024 • edited Loading

alamb commented Apr 16, 2024

peter-toth commented Dec 27, 2023 •

edited

Loading

peter-toth commented Dec 27, 2023 •

edited

Loading

peter-toth commented Dec 28, 2023 •

edited

Loading

peter-toth commented Apr 16, 2024 •

edited

Loading