Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Expr::map_children #9876

Merged

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Mar 30, 2024

Which issue does this PR close?

Closes #9457, part of #8913

Rationale for this change

The current implementation of Expr::map_children() is very complex, this PR tries to simplify its code.

What changes are included in this PR?

This PR adds a map_until_stop_and_collect() macro, which is similar to TransformedIterator::map_until_stop_and_collect() to process sibling tree nodes, but it can process heterogeneous tree node containing expressions.

Are these changes tested?

Yes, with existing UTs.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Mar 30, 2024
@peter-toth peter-toth marked this pull request as draft March 31, 2024 11:52
@peter-toth
Copy link
Contributor Author

This is now ready for review. cc @alamb, @berkaysynnada

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @peter-toth -- I think this looks quite nice and easier to undertand

I left some suggestions on comments / naming that might help, but I also think we could do them as a follow on PR

@@ -532,7 +532,7 @@ impl<T> Transformed<T> {
}
}

/// Transformation helper to process tree nodes that are siblings.
/// Transformation helper to process sequence of iterable tree nodes that are siblings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the reason / logic for continue / jump quite subtle when trying to make TreeNodeMutator. I have some suggested comments below to explain the rationale that might help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb, merged your suggestion.

datafusion/common/src/tree_node.rs Show resolved Hide resolved
/// Transformation helper to process sequence of tree node containing expressions.
/// This macro is very similar to [TransformedIterator::map_until_stop_and_collect] to
/// process nodes that are siblings, but it accepts an initial transformation and a
/// sequence of pairs of an expression and its transformation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite clever. I have some suggestions on naming that would have helped me:

Use F to mirror the nomenclature of TransformedIterator. I thought $TRANSFORMED_EXPR was an actually expr (but it is actually a closure that is invoked I think 🤔 )

$TRANSFORMED_EXPR_0 --> F0      
$TRANSFORMED_EXPR --> F.

I think it would help to document each parameter and the return value (specifically it looks like it returns Transformed<(data, ..., data)>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me fix these and come back yo you.

Copy link
Contributor Author

@peter-toth peter-toth Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in a36f1aa and added rustdoc to the macro. Let me know if it needs more details.

macro_rules! map_until_stop_and_collect {
($TRANSFORMED_EXPR_0:expr, $($EXPR:expr, $TRANSFORMED_EXPR:expr),*) => {{
$TRANSFORMED_EXPR_0.and_then(|Transformed { data: data0, mut transformed, mut tnr }| {
let data = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could name this something other than data so it was clearer what type it is (maybe all_datas?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a36f1aa

null_treatment,
))
}),
}) => map_until_stop_and_collect!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very clever

@peter-toth
Copy link
Contributor Author

The test failure seems unrelated to this PR and looks like a non-deterministic test...

@alamb
Copy link
Contributor

alamb commented Apr 3, 2024

The test failure seems unrelated to this PR and looks like a non-deterministic test...

I agree

https://github.com/apache/arrow-datafusion/actions/runs/8543135777/job/23406267036?pr=9876

External error: query result mismatch:
[SQL] WITH RECURSIVE my_cte AS(
    SELECT 1::int AS a
    UNION ALL
    SELECT a::bigint+2 FROM my_cte WHERE a<3
) SELECT *, arrow_typeof(a) FROM my_cte;
[Diff] (-expected|+actual)
-   1 Int32
-   3 Int32
+   3 Int32
+   1 Int32
at test_files/cte.slt:751

I restarted it

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @peter-toth

@alamb
Copy link
Contributor

alamb commented Apr 3, 2024

The CI test passed on rerun. I filed #9930 to track the non deterministic test (and we are trying to get to issue 10,000!)

@alamb alamb merged commit 2f55003 into apache:main Apr 3, 2024
24 checks passed
@peter-toth
Copy link
Contributor Author

Thanks @alamb for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TreeNode Expr Implementation for LogicalPlan's
2 participants