Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix join and unnest planning to ensure that duplicate join prefixes are not used #13943

Merged
merged 2 commits into from
Mar 22, 2023

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Mar 16, 2023

Description

This PR fixes a join query planning regression caused by #13902. After that change, this query will actually plan "correctly" but then later explode due to duplicate join prefix. The planner was selecting a join prefix only looking at the RowSignature of the left and right side datasources, but if those queries had any nested join prefixes, the query would later fail because the native prefix checker flattens the join to check prefixes (so losing the context of what is used directly in outside layers).

The solution Ive added is to just dig through the datasources to extract any existing prefixes and add that to the checked set when selecting new prefixes, to ensure the planner never picks duplicate prefixes even if it can't see them at the top level.

The added test plans and produces the same result as prior to #13902. Full disclaimer, I didn't actually work out the answer by hand to ensure that it was correct (the repro query was relayed to me), but at least it isn't different than before the change 😅 .


This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@clintropolis clintropolis changed the title fix join and unnest planning to ensure that duplicate join prefixes a… fix join and unnest planning to ensure that duplicate join prefixes are not used Mar 16, 2023
.build()
),
"j0.",
equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
.build()
),
"j0.",
equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
.build()
),
"j0.",
equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
.build()
),
"j0.",
equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM. It looks like a pre-existing bug with join planning: because we're only looking at the row signature when determining what rightPrefix to use, we might choose a prefix that was used for a join where no RHS columns were selected. (Like an inner-join-used-as-filter, as in your test case.)

"j0.",
equalsCondition(makeColumnExpression("v0"), makeColumnExpression("j0.v0")),
"_j0.",
equalsCondition(makeColumnExpression("v0"), makeColumnExpression("_j0.v0")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
"j0.",
equalsCondition(makeColumnExpression("v0"), makeColumnExpression("j0.v0")),
"_j0.",
equalsCondition(makeColumnExpression("v0"), makeColumnExpression("_j0.v0")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants