Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logical vs physical schema mismatch for UNION where some inputs are constants #12954

Merged
merged 6 commits into from
Oct 23, 2024

Conversation

wiedld
Copy link
Contributor

@wiedld wiedld commented Oct 15, 2024

Which issue does this PR close?

Part of #12733
closes #13010

Rationale for this change

We found another bug for when the logical vs physical schema does not match. In this specific case, the UNION schema will select which side has the nullable field -- and default to taking the left side.

In the example case, we have the nullable field as the right side. With this setup, it became evident that we are not adding the field metadata from the left side => to the right side.

What changes are included in this PR?

First commit is the reproducer.
Second commit is the fix.
Third commit updates the test/reproducer now that the fix is in.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Oct 15, 2024
@wiedld wiedld marked this pull request as ready for review October 15, 2024 22:22
@@ -123,6 +123,19 @@ ORDER BY id, name, l_name;
NULL bar NULL
NULL NULL l_bar

# Regression test: missing field metadata from left side of the union when right side is chosen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this query fails without the code changes in this PR:

External error: query failed: DataFusion error: Schema error: No field named nonnull_name. Valid fields are table_with_metadata.id, table_with_metadata.name, table_with_metadata.l_name.
[SQL] select name from (
  SELECT nonnull_name as name FROM "table_with_metadata"
  UNION ALL
  SELECT NULL::string as name
) group by name order by name;
at test_files/metadata.slt:127

Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld -- this looks good, but I have a question about unions with more than 2 inputs

let mut metadata = field.metadata().clone();

let other_side_metdata = inputs
.get(input_idx ^ (1 << 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the ^ (1 << 0) construction used for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot understand this part, too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW @wiedld told me that this basically "gets the other input" -- so if inpux_index is 0 this returns 1 and if input_index is 0 this returns 1.

I believe she has plans to comment on this PR shortly

None
}
.enumerate()
.map(|(input_idx, input)| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to recommend extracting this logic into a function that is commented to help explain what it is doing -- specifically I think it is trying to get the first non-null metadata from any previous input

Reading it more closely, though, doesn't this code assume there are exactly 2 inputs to the Union? What if there are more than 2 inputs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that. It would be helpful to add some inline comments, or possibly extract it into another function

@alamb alamb marked this pull request as draft October 18, 2024 19:35
@alamb
Copy link
Contributor

alamb commented Oct 18, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb
Copy link
Contributor

alamb commented Oct 18, 2024

I filed #13010 specifically to track this issue so we don't lose track of it

@wiedld
Copy link
Contributor Author

wiedld commented Oct 18, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

Thank you.
As you know, I've been shifted to a higher priority item. 🙏🏼

@itsjunetime
Copy link
Contributor

itsjunetime commented Oct 22, 2024

I took this over on our end and just pushed some more commits to

  1. Fix another issue I ran into (specifically, when unioning multiple schemas where the first is not the one with the metadata)
  2. Make it work with multiple schemas

I'll rebase it soon and I can add some more comments to fix the concerns commented here earlier.

@itsjunetime itsjunetime force-pushed the 12733/metadata-xfer-both-directions branch from 66d6554 to b229807 Compare October 22, 2024 21:53
@alamb alamb marked this pull request as ready for review October 23, 2024 10:50
@alamb alamb changed the title Fix logical vs physical schema mismatch for right side field schema selection. Fix logical vs physical schema mismatch for UNION where some inputs are constants Oct 23, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @itsjunetime and @wiedld and @berkaysynnada -- I think this PR now looks good to me.

Copy link
Contributor Author

@wiedld wiedld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @itsjunetime .

@alamb alamb merged commit 3e940a9 into apache:main Oct 23, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 23, 2024

Thanks again @itsjunetime and @wiedld

@alamb alamb deleted the 12733/metadata-xfer-both-directions branch October 23, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UNION ALL with null constants results in Schema error
4 participants