Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert bool_and & bool_or to UDAF #11009

Merged
merged 9 commits into from
Jun 20, 2024
Merged

Conversation

jcsherin
Copy link
Contributor

Which issue does this PR close?

Closes #11008.

What changes are included in this PR?

  • Converts bool_and, bool_or to UDAF
  • Export above as fluent style API for creating Exprs
  • Add expressions to roundtrip_expr_api test

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Jun 19, 2024
Accumulator, AggregateUDFImpl, GroupsAccumulator, ReversedUDAF, Signature, Volatility,
};

use datafusion_physical_expr_common::aggregate::groups_accumulator::bool_op::BooleanGroupsAccumulator;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use some help here on how to proceed with extracting BooleanGroupsAccumulator from physical-expr-common.

The BooleanGroupsAccumulator depends on NullState which in turn has other users as seen here:

$ rg NullState -c
datafusion-examples/examples/advanced_udaf.rs:4
datafusion/physical-expr/src/lib.rs:1
datafusion/physical-expr/src/aggregate/average.rs:3
datafusion/physical-expr/src/aggregate/groups_accumulator/mod.rs:2
datafusion/physical-expr-common/src/aggregate/groups_accumulator/bool_op.rs:4
datafusion/physical-expr-common/src/aggregate/groups_accumulator/prim_op.rs:4
datafusion/physical-expr-common/src/aggregate/groups_accumulator/accumulate.rs:15

Copy link
Contributor Author

@jcsherin jcsherin Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also imported by the following in functions-aggregate:

  1. count
  2. bit_and_or_xor
  3. sum

So maybe this is better tackled in a separate PR and is ok for now? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave BooleanGroupsAccumulator in physical-expr-common until we have moved the other boolean aggregate functionss over -- then I think BooleanGroupsAccumulator should be able to move without issue

}

fn order_sensitivity(&self) -> AggregateOrderSensitivity {
AggregateOrderSensitivity::Insensitive
Copy link
Contributor Author

@jcsherin jcsherin Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is AggregateOrderSensitivity::HardRequirement. Is the use of Insensitive here for bool_and and bool_or the correct usage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insensitive makes sense to me -- @mustafasrepo perhaps you can confirm?

Comment on lines +676 to +677
bool_and(lit(true)),
bool_or(lit(true)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to roundtrip_expr_api.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jcsherin -- this is a really neat first PR

I think we should remove the commented out tests, but then this looks good to go from my perspective

cc @jayzhan211

Accumulator, AggregateUDFImpl, GroupsAccumulator, ReversedUDAF, Signature, Volatility,
};

use datafusion_physical_expr_common::aggregate::groups_accumulator::bool_op::BooleanGroupsAccumulator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave BooleanGroupsAccumulator in physical-expr-common until we have moved the other boolean aggregate functionss over -- then I think BooleanGroupsAccumulator should be able to move without issue

datafusion/functions-aggregate/src/bool_and_or.rs Outdated Show resolved Hide resolved
}

fn order_sensitivity(&self) -> AggregateOrderSensitivity {
AggregateOrderSensitivity::Insensitive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insensitive makes sense to me -- @mustafasrepo perhaps you can confirm?

datafusion/functions-aggregate/src/bool_and_or.rs Outdated Show resolved Hide resolved
@jcsherin
Copy link
Contributor Author

jcsherin commented Jun 19, 2024

I've pushed the following changes based on the code review:

  • Delete the redundant test code
  • Use self.name() instead of hard-coding either "bool_and"/"bool_or" strings

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jcsherin -- looks great to me. I'll wait a while to merge this to let @jayzhan211 / @mustafasrepo have a chance to review if they want

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 89def2c into apache:main Jun 20, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 20, 2024

Thanks again @jcsherin and @jayzhan211

@jcsherin jcsherin deleted the convert-udaf-bool-and-or branch June 20, 2024 15:16
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
* Port `bool_and` and `bool_or` to `AggregateUDFImpl`

* Remove trait methods with default implementation

* Add `bool_or_udaf`

* Register `bool_and` and `bool_or`

* Remove from `physical-expr`

* Add expressions to logical plan roundtrip test

* minor: remove methods with default implementation

* Removes redundant tests

* Removes hard-coded function names
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
* Port `bool_and` and `bool_or` to `AggregateUDFImpl`

* Remove trait methods with default implementation

* Add `bool_or_udaf`

* Register `bool_and` and `bool_or`

* Remove from `physical-expr`

* Add expressions to logical plan roundtrip test

* minor: remove methods with default implementation

* Removes redundant tests

* Removes hard-coded function names
@alamb alamb added the api change Changes the API exposed to users of the crate label Jul 9, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Port `bool_and` and `bool_or` to `AggregateUDFImpl`

* Remove trait methods with default implementation

* Add `bool_or_udaf`

* Register `bool_and` and `bool_or`

* Remove from `physical-expr`

* Add expressions to logical plan roundtrip test

* minor: remove methods with default implementation

* Removes redundant tests

* Removes hard-coded function names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert BoolAndOr to UDAF
3 participants