-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix logical vs physical schema mismatch for UNION where some inputs are constants #12954
Fix logical vs physical schema mismatch for UNION where some inputs are constants #12954
Conversation
@@ -123,6 +123,19 @@ ORDER BY id, name, l_name; | |||
NULL bar NULL | |||
NULL NULL l_bar | |||
|
|||
# Regression test: missing field metadata from left side of the union when right side is chosen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this query fails without the code changes in this PR:
External error: query failed: DataFusion error: Schema error: No field named nonnull_name. Valid fields are table_with_metadata.id, table_with_metadata.name, table_with_metadata.l_name.
[SQL] select name from (
SELECT nonnull_name as name FROM "table_with_metadata"
UNION ALL
SELECT NULL::string as name
) group by name order by name;
at test_files/metadata.slt:127
Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld -- this looks good, but I have a question about unions with more than 2 inputs
let mut metadata = field.metadata().clone(); | ||
|
||
let other_side_metdata = inputs | ||
.get(input_idx ^ (1 << 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the ^ (1 << 0)
construction used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand this part, too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW @wiedld told me that this basically "gets the other input" -- so if inpux_index
is 0 this returns 1
and if input_index
is 0
this returns 1
.
I believe she has plans to comment on this PR shortly
None | ||
} | ||
.enumerate() | ||
.map(|(input_idx, input)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to recommend extracting this logic into a function that is commented to help explain what it is doing -- specifically I think it is trying to get the first non-null metadata from any previous input
Reading it more closely, though, doesn't this code assume there are exactly 2 inputs to the Union? What if there are more than 2 inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. It would be helpful to add some inline comments, or possibly extract it into another function
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
I filed #13010 specifically to track this issue so we don't lose track of it |
Thank you. |
I took this over on our end and just pushed some more commits to
I'll rebase it soon and I can add some more comments to fix the concerns commented here earlier. |
…ansferred to the right side
…d metadata from both to each other
66d6554
to
b229807
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @itsjunetime and @wiedld and @berkaysynnada -- I think this PR now looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @itsjunetime .
Thanks again @itsjunetime and @wiedld |
Which issue does this PR close?
Part of #12733
closes #13010
Rationale for this change
We found another bug for when the logical vs physical schema does not match. In this specific case, the UNION schema will select which side has the nullable field -- and default to taking the left side.
In the example case, we have the nullable field as the right side. With this setup, it became evident that we are not adding the field metadata from the left side => to the right side.
What changes are included in this PR?
First commit is the reproducer.
Second commit is the fix.
Third commit updates the test/reproducer now that the fix is in.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.