Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer: push down predicate inferred by hash join equal condition #8376

Open
Tracked by #5605
yuhao-su opened this issue Mar 6, 2023 · 9 comments
Open
Tracked by #5605
Assignees
Labels
good first issue Good for newcomers help wanted Issues that need help from contributors type/feature
Milestone

Comments

@yuhao-su
Copy link
Contributor

yuhao-su commented Mar 6, 2023

For queries like select * from a, b on a.x = b.x and a.y = b.x, we can push down a filter before side a with condition a.x=a.y and only keep one of the two join condition in hash join executor.

Such optimization can apply to almost all kinds of hash join with the exception of anti-join.

For anti-join we can not push down a filter or keep let join key. Instead, we need to evaluate the predicate in hash join executor. If a.x=a.y is false, we know the join condition a.x = b.x and a.y = b.x is false.

This can also solve the #7698

@yuhao-su
Copy link
Contributor Author

yuhao-su commented Mar 6, 2023

cc. @chenzl25 @st1page

@st1page
Copy link
Contributor

st1page commented Mar 6, 2023

It will little more complex when we consider the NULL-safe equal condition 🤔 We can handle them respectively but I am not sure if we can handle this condition left.a is not distinct from right.b AND left.c = right.b. can we push down the condition like left.a = left.c OR left.a is null OR left.c is null?
Maybe we need to re-confirm the query requirement from the user side and if we need to handle the NULL-safe equal condition here. IIRC, it is to handle some ring detection. c.c. @lmatz

@lmatz
Copy link
Contributor

lmatz commented Mar 6, 2023

Yeah, in the original use case, a and b are different alias of the same table that is used to represent a graph, where x and y are src and dst of an edge. So I guess we can assume that x and y are both non-null in this particular use case.

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Mar 7, 2023

Using an optimization rule to solve a panic sounds a little bit weird to me. It would be a required step to transform plans into a preferred form, rather than an optional "optimization".

@chenzl25
Copy link
Contributor

chenzl25 commented Mar 7, 2023

It will little more complex when we consider the NULL-safe equal condition 🤔 We can handle them respectively but I am not sure if we can handle this condition left.a is not distinct from right.b AND left.c = right.b. can we push down the condition like left.a = left.c OR left.a is null OR left.c is null? Maybe we need to re-confirm the query requirement from the user side and if we need to handle the NULL-safe equal condition here. IIRC, it is to handle some ring detection. c.c. @lmatz

Once null safe equal appears, we can just push down a condition like left.a is not distinct from left.c, because it it strict enough to filter out most records, except the null one. Luckily, the upper condition left.a is not distinct from right.b AND left.c = right.b can do this favor for us.

@yuhao-su
Copy link
Contributor Author

yuhao-su commented Mar 7, 2023

Using an optimization rule to solve a panic sounds a little bit weird to me. It would be a required step to transform plans into a preferred form, rather than an optional "optimization".

Good point. Although this is a potential way to address the problem, we decided to fix the bug by #8377 and treat this one as an optimization.

@jon-chuang
Copy link
Contributor

jon-chuang commented Mar 8, 2023

We do something similar here:

fn derive_predicate_from_eq_condition(

But this is implemented only for one-sided expression.

Needs futher investigation

By applying this to eq conditions, this could turn the following condition:
select * from a, b on a.x = b.x and a.y = b.x

into

select * from a, b on a.x = a.y and a.y = a.x and a.x = b.x, and a.y = b.x

NULL safety reasoning could be extended from here: #6895. In particular: we "can only handle pushing down to an inner side.". Not sure if it is accurate when applied reflexively to eq conditions.

@fuyufjh
Copy link
Member

fuyufjh commented Apr 17, 2023

Hey, any updates?

@st1page
Copy link
Contributor

st1page commented Apr 17, 2023

Hey, any updates?

Because we solve the #7698, it becomes less emergent and a optimization.

@st1page st1page added good first issue Good for newcomers help wanted Issues that need help from contributors labels Apr 17, 2023
@st1page st1page changed the title push down predicate inferred by hash join equal condition Optimizer: push down predicate inferred by hash join equal condition Apr 17, 2023
@fuyufjh fuyufjh modified the milestones: release-1.1, release-1.2 Aug 8, 2023
@st1page st1page modified the milestones: release-1.4, release-1.5 Nov 8, 2023
@st1page st1page modified the milestones: release-1.5, release-1.6 Dec 5, 2023
@st1page st1page removed this from the release-1.6 milestone Jan 9, 2024
@st1page st1page added this to the release-1.7 milestone Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Issues that need help from contributors type/feature
Projects
None yet
Development

No branches or pull requests

8 participants