Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Pruning into physical-optimizer crate #13485

Merged
merged 4 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 35 additions & 31 deletions datafusion-cli/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion datafusion/core/src/physical_optimizer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ pub mod enforce_sorting;
pub mod join_selection;
pub mod optimizer;
pub mod projection_pushdown;
pub mod pruning;
pub mod replace_with_order_preserving_variants;
pub mod sanity_checker;
#[cfg(test)]
Expand Down
4 changes: 4 additions & 0 deletions datafusion/physical-optimizer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,17 @@ workspace = true

[dependencies]
arrow = { workspace = true }
arrow-array = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, can you avoid this explicit dependency? Whatever is used in arrow-array should be available in arrow and then we can keep the dependency chain simpler

arrow-schema = { workspace = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-execution = { workspace = true }
datafusion-expr-common = { workspace = true, default-features = true }
datafusion-expr = { workspace = true, default-features = true }
datafusion-functions-nested = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move datafusion-functions-nested to dev-dependencies?

I also think we can avoid the dependency on datafusion-expr by changing

use datafusion_expr::Operator;

to

use datafusion_expr_common::operator::Operator;

datafusion-physical-expr = { workspace = true }
datafusion-physical-plan = { workspace = true }
itertools = { workspace = true }
log = { workspace = true }
recursive = { workspace = true }

[dev-dependencies]
Expand Down
1 change: 1 addition & 0 deletions datafusion/physical-optimizer/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ pub mod limit_pushdown;
pub mod limited_distinct_aggregation;
mod optimizer;
pub mod output_requirements;
pub mod pruning;
pub mod topk_aggregation;
pub mod update_aggr_exprs;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,10 @@
use std::collections::HashSet;
Copy link
Contributor Author

@irenjj irenjj Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion-expr dependency is needed by //! [Expr]: crate::prelude::Expr, should we remove the comment to avoid introducing new dependencies? @alamb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places, we have changed the link to docs.rs directly. So in this case, change the link to

https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html

use std::sync::Arc;

use crate::{
common::{Column, DFSchema},
error::{DataFusionError, Result},
logical_expr::Operator,
physical_plan::{ColumnarValue, PhysicalExpr},
};
use datafusion_common::error::{DataFusionError, Result};
use datafusion_common::{Column, DFSchema};
use datafusion_expr::Operator;
use datafusion_physical_plan::{ColumnarValue, PhysicalExpr};

use arrow::{
array::{new_null_array, ArrayRef, BooleanArray},
Expand Down Expand Up @@ -653,7 +651,7 @@ impl PruningPredicate {

// this is only used by `parquet` feature right now
#[allow(dead_code)]
pub(crate) fn required_columns(&self) -> &RequiredColumns {
pub fn required_columns(&self) -> &RequiredColumns {
&self.required_columns
}

Expand Down Expand Up @@ -762,7 +760,7 @@ fn is_always_true(expr: &Arc<dyn PhysicalExpr>) -> bool {
/// Handles creating references to the min/max statistics
/// for columns as well as recording which statistics are needed
#[derive(Debug, Default, Clone)]
pub(crate) struct RequiredColumns {
pub struct RequiredColumns {
/// The statistics required to evaluate this predicate:
/// * The unqualified column in the input schema
/// * Statistics type (e.g. Min or Max or Null_Count)
Expand All @@ -786,7 +784,7 @@ impl RequiredColumns {
/// * `true` returns None
#[allow(dead_code)]
// this fn is only used by `parquet` feature right now, thus the `allow(dead_code)`
pub(crate) fn single_column(&self) -> Option<&phys_expr::Column> {
pub fn single_column(&self) -> Option<&phys_expr::Column> {
if self.columns.windows(2).all(|w| {
// check if all columns are the same (ignoring statistics and field)
let c1 = &w[0].0;
Expand Down Expand Up @@ -1664,8 +1662,8 @@ mod tests {
use std::ops::{Not, Rem};

use super::*;
use crate::assert_batches_eq;
use crate::logical_expr::{col, lit};
use datafusion_common::assert_batches_eq;
use datafusion_expr::{col, lit};

use arrow::array::Decimal128Array;
use arrow::{
Expand Down
Loading