-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Simplify + document EliminateCrossJoin
better
#10427
Conversation
@@ -885,7 +885,7 @@ pub fn can_hash(data_type: &DataType) -> bool { | |||
/// Check whether all columns are from the schema. | |||
pub fn check_all_columns_from_schema( | |||
columns: &HashSet<Column>, | |||
schema: DFSchemaRef, | |||
schema: &DFSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes the schema by reference rather than value now.
Since DFSchemaRef
is an Arc
I don't expect this to make any measurable performance improvement, but I think it makes it clearer what is going on and avoids a bunch of clone
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -107,7 +107,7 @@ impl OptimizerRule for EliminateCrossJoin { | |||
left = find_inner_join( | |||
&left, | |||
&mut all_inputs, | |||
&mut possible_join_keys, | |||
&possible_join_keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_inner_join
does not modify possible_join_keys
so update the signature to make that clear
} | ||
} | ||
Ok(true) | ||
} | ||
|
||
/// Finds the next to join with the left input plan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time studying this code so wanted to document what it is doing
@jackwener I wonder if you have time to review this PR? |
@@ -909,8 +909,8 @@ pub fn check_all_columns_from_schema( | |||
pub fn find_valid_equijoin_key_pair( | |||
left_key: &Expr, | |||
right_key: &Expr, | |||
left_schema: DFSchemaRef, | |||
right_schema: DFSchemaRef, | |||
left_schema: &DFSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats interesting, should we a reference instead of Arc whenever possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general we should use a reference when the underlying callsite doesn't actually need a owned copy
So in this case, the callsite just needs a &DFSchema
so there is no need to go through the overhead of creating the Arc
I think the runtime overhead of creating another Arc is likely to be unmeasurable in all but the most extreme circumstances. However I think the code is often easier to reason about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb nice clean up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice job to me, thanks @alamb
Thanks for the reviews @jackwener and @comphead ❤️ -- this now opens up a few more PRs I have queued up to make this code faster (by reducing cloning) #10430 |
Which issue does this PR close?
Part of #10287
Rationale for this change
I am trying to avoid copying as much in
EliminateCrossJoin
and I noticed a fewthings that can be simpler. While this avoids some copies I don't think it will really impact performance much.
Instead this is a step towards a larger refactor to avoid the actual plan copies
What changes are included in this PR?
&mut
to make it clear what is being changedDFSchemaRef
Are these changes tested?
By existing CI
Are there any user-facing changes?
No