-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Expr::column_refs to find column references without copying #10948
Conversation
migrate some uses of to_column
/// assert!(refs.contains(&Column::new_unqualified("a"))); | ||
/// assert!(refs.contains(&Column::new_unqualified("b"))); | ||
/// ``` | ||
pub fn column_refs(&self) -> HashSet<&Column> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of this function is quite nice now compared to the expr_to_columns
one: https://github.com/alamb/datafusion/blob/58d0c34d77c9a5202e62b9281cdbf1046abaa096/datafusion/expr/src/utils.rs#L264-L309
@@ -46,6 +46,7 @@ pub const COUNT_STAR_EXPANSION: ScalarValue = ScalarValue::Int64(Some(1)); | |||
|
|||
/// Recursively walk a list of expression trees, collecting the unique set of columns | |||
/// referenced in the expression | |||
#[deprecated(since = "40.0.0", note = "Expr::add_column_refs instead")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is not used anywhere in the datafusion codebase
@@ -785,7 +782,7 @@ impl OptimizerRule for PushDownFilter { | |||
let mut keep_predicates = vec![]; | |||
let mut push_predicates = vec![]; | |||
for expr in predicates { | |||
let cols = expr.to_columns()?; | |||
let cols = expr.column_refs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty good example where there is no need to copy Column
s simply to check if they are referenced.
This PR looks really nice, let me know when it is ready for review. |
I think this is now ready for review. I ran benchmarks and they show some slight improvement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I have only a minor suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Thanks @comphead and @peter-toth |
…he#10948) * Add Expr::column_refs to find column references without copying migrate some uses of to_column * Simplify condition
…he#10948) * Add Expr::column_refs to find column references without copying migrate some uses of to_column * Simplify condition
…he#10948) * Add Expr::column_refs to find column references without copying migrate some uses of to_column * Simplify condition
Which issue does this PR close?
Part of #10505
Rationale for this change
Code that uses
Expr::to_columns
orexpr_to_columns
copies the column name strings, often to simply check if a column is present. A non copying API would be more efficient, and now thanks to the improvements in the TreeNode API (from #10543, thanks @peter-toth!) it is straightforward to implementWhat changes are included in this PR?
Expr::column_refs
andExpr::add_column_refs
exprlist_to_columns
which is not used anywhere in the datafusion codebaseI plan to deprecate
Expr::to_columns
andexpr_to_columns
as follow on PRsAre these changes tested?
Yes, by existing tests and new doc examples
Performance results show basically no improvement (maybe it gets faster for some of the queries with larger numbers of columns) but it may also be noise
Details
Are there any user-facing changes?
There is a new API