-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect OFFSET during LIMIT pushdown. #12399
Fix incorrect OFFSET during LIMIT pushdown. #12399
Conversation
… multiple input streams
fd0f609
to
b6fd751
Compare
cc @itsjunetime |
@mertak-synnada this is a fix to what we believe is a bug from this (very excellent) change. We would appreciate your review 🙏🏼 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may negate some of the performance improvements gained by the initial PR that introduced these bugs, but I think once this is merged, we can refactor the pushdown_limit_helper
function slightly to keep the correct behavior while pulling in the improvements again. I think it's just more important to get a fix merged first since this did break existing behavior.
d125bc2
to
34b94f0
Compare
I have filed #12423 to track this issue and updated this PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld and @itsjunetime and @mertak-synnada
I think this code is looking good to me. I had a few suggestions about comments, but the code and testing seems 👍 to me
@@ -256,21 +265,24 @@ pub(crate) fn pushdown_limits( | |||
pushdown_plan: Arc<dyn ExecutionPlan>, | |||
global_state: GlobalRequirements, | |||
) -> Result<Arc<dyn ExecutionPlan>> { | |||
// Call pushdown_limit_helper. | |||
// This will either extract the limit node (returning the child), or apply the limit pushdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a good comment to add to the pushdown_limit_helper
function as well
I am going to make the comment suggestions to this PR so we can merge it in. |
…ushdown-with-offsey
@itsjunetime I wonder if you can elaborate on this or file a ticket. In order to avoid the final The implementation of Limit is pretty straightforward: datafusion/datafusion/physical-plan/src/limit.rs Lines 342 to 404 in f24f2cb
|
I believe @itsjunetime mentioned for the first commit, but after my change suggestion, the gains should be preserved, imo. |
Thanks everyone for your help getting this done! |
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
* test: demonstrate offset not applied correctly with limit pushdown on multiple input streams * fix: do not pushdown when skip is applied * test: update tests after fix * chore: more doc cleanup * chore: move LIMIT+OFFSET tests to proper sqllogic test case * refactor: add global limit back (if there is a skip) during limit pushdown * Apply suggestions from code review * Add comment explaining why --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Fixes #12423
First commit demonstrates the bug.
Rationale for this change
First commit demonstrates the current, incorrect behavior where the offset is not applied correctly during limit pushdown.
Followup commits add the fix, as well as a few doc comments.
What changes are included in this PR?
Slight change in offset handling during one of the helper functions with the limit pushdown.
Also added some docs to help explain existing code.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.