-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not add redundant subquery ordering into plan #12003
Do not add redundant subquery ordering into plan #12003
Conversation
…ting' into feature/Sort-For-Subqueries # Conflicts: # datafusion/sqllogictest/test_files/window.slt
…ting' into feature/Sort-For-Subqueries
…ting' into feature/Sort-For-Subqueries
remove unnecessary limit plans when used with sort + fetch add test case for Sort and Limit with offset push down limit even if a child with no fetch appears when the child supports push down
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
…eature/Sort-For-Subqueries # Conflicts: # datafusion/core/src/physical_optimizer/enforce_distribution.rs # datafusion/core/src/physical_optimizer/enforce_sorting.rs # datafusion/core/src/physical_optimizer/limit_pushdown.rs # datafusion/core/src/physical_optimizer/sort_pushdown.rs # datafusion/physical-plan/src/coalesce_batches.rs # datafusion/physical-plan/src/limit.rs # datafusion/physical-plan/src/sorts/sort.rs # datafusion/sqllogictest/test_files/order.slt
add pushes_global_limit_into_multiple_fetch_plans test case change limit_pushdown.rs as manual top down operator and simplify algorithm by supporting most parent node remove and other pushdown cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I collaborated with @mertak-synnada on this and it LGTM. A lot of plan simplifications/improvements! 🚀
cc @alamb
I plan to review this PR later today or tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @melgenek and @ozankabak
I had a small request related to some of the tests but otherwise looks good to me.
I didn't review the code in detail, but I did review the plan changes carefully and they looked good to me. Thank you again -- these plans are looking really quite good now
\n TableScan: aggregate_test_100 projection=[c1]\ | ||
\n SubqueryAlias: t2\ | ||
\n TableScan: aggregate_test_100 projection=[c1]", | ||
\n Sort: t1.c1 ASC NULLS FIRST, fetch=1\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plan looks better to me because since the sort already has a fetch there is no reason to also apply a limit afterwards
07)------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
08)--------------AggregateExec: mode=Partial, gby=[trace_id@0 as trace_id], aggr=[max(traces.timestamp)] | ||
09)----------------MemoryExec: partitions=1, partition_sizes=[1] | ||
01)SortPreservingMergeExec: [trace_id@0 ASC NULLS LAST], fetch=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sure is getting easier to understand what the plans are doing here
@@ -3898,6 +3898,7 @@ SELECT * FROM ( | |||
) as lhs RIGHT JOIN ( | |||
SELECT * from right_table_no_nulls | |||
ORDER BY b | |||
LIMIT 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than modifying the existing test, can you please add a new one ? If we modify the existing test we may also inadvertently be changing the coverage of this test
Likewise for the other tests in this file that now have limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not leave the test as is because the sort in the subquery gets optimized away (rightfully), breaking the intent of the test. These kind of tests seems to be originally written with the assumption that the sort wouldn't be optimized away, and the subquery with ORDER BY was mimicking a table with order.
Placing the limit prevents the optimizer from optimizing the sort away, keeping the intent of the test intact. Keeping the test as is gives us a degenerate test with our new, more powerful optimizer.
Once we get WITH ORDER
support for memory tables, we will simply use a memory table with ordering in these tests, and they will be reverted back fully (i.e. the LIMIT will be removed).
a 5 -38 | ||
a 5 -54 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this chnage looks like it is because the test itself is not deterministic
# Conflicts: # datafusion/sqllogictest/test_files/window.slt
This reverts commit 02eab80.
This PR seems to have introduced a bug #12423 |
Which issue does this PR close?
Rationale for this change
This PR solves an issue with the subquery sortings. If a main query does not support the same sorting with the subquery and the subquery has no limit, then the ordering is redundant (unless used in an Aggregation but it's being handled in a different plan type anyway).
While implementing this behavior we discovered that it's only possible to use
WITH ORDER
syntax withEXTERNAL
tables. That's why we needed to addLIMIT xx
conditions to subqueries so that the system can understand they're valid orderings. The sql-parser issue will be handled in a different PR to the sql-parser project and those tests will be changed back to their original formsOther than that, we discovered there are some redundant LimitExec plans even if the child plan has fetch support. Also, fixed this issue and replaced
GlobalLimitExec
plans withskip=0
withLocalLimitExec
plan type.Also added Limit push down support to the
UnionExec
plan type.What changes are included in this PR?
Explained above
Are these changes tested?
Yes
Are there any user-facing changes?
No