- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
          Consolidate TreeNode transform and rewrite APIs
          #8891
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| cc @alamb, @andygrove, @yahoNanJing, @berkaysynnada this PR is another step to cleaner  | 
| 
 These changes seem reasonable to me and, since they propose a similar approach to other recursive flows, they make understanding easier. However, I am not sure about this change: 
 
 | 
| Thanks for the feedback @berkaysynnada. 
 I feel that a simple (simpler than than  Also, just to add a bit more details to this part of the PR: While I was local testing this PR I accidentally changed all  
 I'm not sure I get this part. | 
| 
 The docstring you have updated for  I meant what if we intend to do such: If we want to offer such a use, I think we may need to consider this way as well. | 
| 
 This is correct. This is because the proposed new  If someone needs  | 
| 
 I checked again. Yes, the order is correct. What I really meant is that when we use  Since the nodes access their children, first top-down and then bottom-up traversals seem to be a single pass, doing it in reverse order is like 2 passes, but as a behavior, it may be desirable to have reverse order and should we provide a way for this? | 
| 
 That's correct. I don't think that botom-up and then top-down can't be achieved in one run. 
 API users can explicitly call  | 
| 
 Since we set out to simplify and provide easy to use TreeNode API and its related implementations as much as possible, IMO we need to reach a simple state at the first step. As far as I observed, there is no use for this version of  | 
| 
 Although I used the new  | 
| Hi @peter-toth -- thank you for this PR and the various others that are trying to improve the quality of code in DataFusion I think one thing that is challenging for me is to understand how to evaluate the benefits of this and related PRs in an objective manner. This PR, for example, may improve the consistency of APIs (which I happen to think it does) but making this change will also come at a cost for DataFusion users: 
 For other breaking API changes, there are often other benefits that come with the cost, such as new features or improved performance. I wonder if you can help us articulate how this (and the related refactoring PRs) help DataFusion users -- for example do they 
 If we can connect this work to a benefit I suspect we would be able to drive it forward more quickly | 
| Thanks for the question @alamb! I'm new to DataFusion so this discussion could help me a lot to understand what ways are viable in terms of improving the existing code. I came from a different open-source query engine but I think  Here are a few things I found weird with the current APIs. These can cause a newcommer (like me) some confusion: 
 What features I miss from the current API: 
 So to recap my ideal      /// Transforms the tree using `f_down` and `f_up` closures. `f_down` is applied on a
    /// node while traversing the tree top-down (pre-order, before the node's children are
    /// visited) while `f_up` is applied on a node while traversing the tree bottom-up
    /// (post-order, after the the node's children are visited).
    ///
    /// The `f_down` closure takes
    /// - a `PD` type payload from its parent
    /// and returns a tuple made of:
    /// - a possibly modified node,
    /// - a `PC` type payload to pass to `f_up`,
    /// - a `Vec<FD>` type payload to propagate down to its children
    ///    (one `FD` element is propagated down to each child),
    /// - a `TreeNodeRecursion` enum element to control recursion flow.
    ///
    /// The `f_up` closure takes
    /// - a `PC` type payload from `f_down` and
    /// - a `Vec<PU>` type payload collected from its children
    /// and returns a tuple made of:
    /// - a possibly modified node,
    /// - a `FU` type payload to propagate up to its parent,
    /// - a `TreeNodeRecursion` enum element to control recursion flow.
    fn transform_with_payload<FD, PD, PC, FU, PU>(
        self,
        f_down: &mut FD,
        payload_down: PD,
        f_up: &mut FU,
    ) -> Result<(Self, PU)>
    where
        FD: FnMut(Self, PD) -> Result<(Self, Vec<PD>, PC, TreeNodeRecursion)>,
        FU: FnMut(Self, PC, Vec<PU>) -> Result<(Self, PU, TreeNodeRecursion)>,Obviously the closure return types can be extracted to a type alias or  pub struct TransformDownResult<N, PD, PC> {
    pub transformed_node: N,
    pub payload_to_children: Vec<PD>,
    pub payload_to_f_up: PC,
    pub recursion_control: TreeNodeRecursion,
}All the other functions should be just a specialization of the above: 
 I believe all these PRs would take us forward to a simple but still versarile APIs that have/are: 
 Now, as you mentioned some of these proposed changes have conflicting semantics to current APIs. Honestly I can't decide how much turbulance they would cause and so I'm seeking feedback from you and the community. But please note that I don't insist on any of the above functionaly or namings. These are just ideas and I can imagine implementations where: 
 | 
| I see -- thank you @peter-toth. Your explanation makes sense. As a historical note, the current TreeNode API came out of unifying different in inconsistent APIs across the Exprs, LogicalPlan , ExectionPlan and PhysicalExprs. It was a large step forward at the time, but also just getting a single API was a major step forward. 
 I think this is a good insight -- basically that the rationale for this change is to make it easier for new contributors to contribute to DataFusion as they APIs are more uniform. 
 
 I think implementing the "one true API" as you have proposed above and then migrating the rest of the codebase to use it (possibly deprecating the old APIs or changing them one at at time) would be the least disruptive. To that end, what I will do is to create a new ticket describing this work so it is not lost in this ticket, and then we can use that ticket to make the overall plan visible | 
| I filed #8913 -- let me know what you think. What do you think about creating a PR with  | 
| 
 Thanks @alamb! I will share my thoughts regarding  The API inconsitency of the  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR I think this PR is a significant step forward in getting a consistent API and the unification of pre_visit / post_visit -> up/down and a single TreeNodeRecursion is a significant improvement and I think we could merge this PR
Thank you for all the work on this PR @peter-toth
However I do not fully understand how the code in this PR and the code in #8817 from @ozankabak interact / conflict (anyone else following along can find details on #8913 (comment))
So basically I trust you two to come up with something good. I would be happy to merge this PR, or wait for #8817 and evaluate first.
Again, trying to improve things is very much apprecaited
| 
 Yes -- let's first get #8817 in and then I and @berkaysynnada will review and go through this in detail and the others in detail | 
| 
 Sounds good to me. I left a comment #8817 (comment) on #8817. It can help with further steps of the epic: #8913 | 
| Thanks everyone. I will carefully review this PR after #8817 is merged and the related updates are done. Thank you again @peter-toth for your efforts! | 
f94e193    to
    729c9d2      
    Compare
  
    | I've updated the PR description and added  One thing came into my mind that we could keep  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great and is ready to go. Are there any other comments prior to us merging this?
| I haven't gone through it yet - I will this weekend. I will let you guys know if I have any concerns. Given that I have been following, I don't expect anything major but I'd like to be prudent on this one as it is a somewhat foundational change. | 
| I am 80% done reviewing, should be finished tomorrow | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @peter-toth for the great work. I went through all changes carefully and see nothing to hold this up.
I created a simplification PR, can you please merge it to this PR and then I will go ahead and merge it to main.
Two work items for follow-on PRs:
- The code in expr/src/tree_node/expr.rsbecame a little too complex for n-ary (binary and above) expressions. Can you think of a way to simplify that code?
- I agree with @alamb that rewrite().data().rewrite().data()....rewrite().data()chain looks a little weird. It would be great if we can find a neat way to simplify that flow. But this matters less than (1).
Just give me mention when you finalize by incorporating the simplification PR and I will merge quickly to avoid attracting merge conflicts (which I see happens somewhat frequently as this PR touches many files).
| @alamb, this is now good to go from my perspective as well. Any reason to wait before merging? | 
| @ozankabak, thanks for your PR, I've just merged it. I will try to think about those follow-ups.  | 
| 
 I don't think so -- this one has been opened for a long time and has had many chances to review. 🚀 Thanks again @peter-toth @berkaysynnada @ozankabak and everyone else who participated in the discussion | 
| Epic work | 
| Thanks for the review and feedback @berkaysynnada, @alamb, @ozankabak! | 
* refactor `TreeNode::rewrite()` * use handle_tree_recursion in `Expr` * use macro for transform recursions * fix api * minor fixes * fix * don't trust `t.transformed` coming from transformation closures, keep the old way of detecting if changes were made * rephrase todo comment, always propagate up `t.transformed` from the transformation closure, fix projection pushdown closure * Fix `TreeNodeRecursion` docs * extend Skip (Prune) functionality to Jump as it is defined in https://synnada.notion.site/synnada/TreeNode-Design-Proposal-bceac27d18504a2085145550e267c4c1 * fix Jump and add tests * jump test fixes * fix clippy * unify "transform" traversals using macros, fix "visit" traversal jumps, add visit jump tests, ensure consistent naming `f` instead of `op`, `f_down` instead of `pre_visit` and `f_up` instead of `post_visit` * fix macro rewrite * minor fixes * minor fix * refactor tests * add transform tests * add apply, transform_down and transform_up tests * refactor tests * test jump on both a and e nodes in both top-down and bottom-up traversals * better transform/rewrite tests * minor fix * simplify tests * add stop tests, reorganize tests * fix previous merges and remove leftover file * Review TreeNode Refactor (#1) * Minor changes * Jump doesn't ignore f_up * update test * Update rewriter * LogicalPlan visit update and propagate from children flags * Update tree_node.rs * Update map_children's --------- Co-authored-by: Mustafa Akur <[email protected]> * fix * minor fixes * fix f_up call when f_down returns jump * simplify code * minor fix * revert unnecessary changes * fix `DynTreeNode` and `ConcreteTreeNode` `transformed` and `tnr` propagation * introduce TransformedResult helper * fix docs * restore transform as alias to trassform_up * restore transform as alias to trassform_up 2 * Simplifications and comment improvements (#2) --------- Co-authored-by: Berkay Şahin <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Which issue does this PR close?
Part of #8913
Rationale for this change
The
TreeNode.rewrite()/TreeNodeRewriteris very inconsistent with otherTreeNodeapply / visit functions:TreeNodeRewriter.pre_visit()currently returnsRewriteRecursionso as to control the recursion but that enum is very inconsistent with theVisitRecursionenum that visit functions use.RewriteRecursion::Stopfunctionality is equal toVisitRecursion::Skip, that is to skip (prune) the subtree (don't recurse into children and don't run post-order functions).RewriteRecursion::Skipdoesn't prevent recursing into children. This is different toVisitRecursion::Skipwhere recursion into children is not executed.VisitRecursion::Stopfully stops recursion which functionality isn't available withRewriteRecursion.TreeNodeRewriter.mutate()is sometimes a top-down (pre-order) but other times a bottom-up (post-order) function depending onTreeNodeRewriter.pre_visit()result.This PR removes
RewriteRecursionand renamesVisitRecursionto a commonTreeNodeRecursionto control all (apply / visit / rewrite / transform) APIs. PreviousVisitRecursion::Skipgot renamed toJumpand its functionality is now extended to bottom-up and combined traversals as follows:and changes
TreeNodeRewriterto incorporate anf_down()and a nf_up()methods that both can change the node and both can use the commonTransformedstruct (containing aTreeNodeRecursionenum) to control how to proceed with recursion:This solution is capable to replace all
RewriteRecursionfunctionalities and make theTreeNodeRewriters not just cleaner but more versatile.This PR adds a new
transform_down_up()method as a short form ofrewrite()/TreeNodeRewriterto provide a combined traversal:TreeNodeRewriterbut defining 2 closures for top-down and bottom-up transformations is just enough.This PR changes the
Transformedenum the following struct:This PR modifies the
transform_down()andtransform_up()methods to use the newTransformedstruct.This PR modifies the
TreeNodeVisitortrait to have itsf_down()andf_up()alligned withTreeNodeRewritertrait.Please note that this PR implements 2. from #7942 and fixes the
Transformedenum issue: #8835What changes are included in this PR?
This PR:
RewriteRecursion,VisitRecursiontoTreeNodeRecursion. The previousSkipelement is renamed toJump, its top-down functionality is adjusted and its bottom-up functionality is added.Transformedenum to a new struct.TreeNodetransform methods to use the newTransformedstruct.TreeNodeRewriterto incorporate anf_down()andf_up()methods and useTransformed/TreeNodeRecursion,TreeNode.transform_down_up()method.TreeNodeVisitorto havef_down()andf_up()methods aligned withTreeNodeRewriter.apply/visit/transform/rewriteAPIs to usef,f_down,f_upclosure parameter names.Are these changes tested?
Existng tests.
Are there any user-facing changes?
Yes.