Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Average to UDAF #10942 #10964

Merged
merged 22 commits into from
Jun 23, 2024
Merged

Convert Average to UDAF #10942 #10964

merged 22 commits into from
Jun 23, 2024

Conversation

dharanad
Copy link
Contributor

Which issue does this PR close?

Closes #10942

Rationale for this change

Part of #8708

What changes are included in this PR?

Are these changes tested?

cargo build
cargo test
cargo test --test sqllogictests 

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate labels Jun 17, 2024
@github-actions github-actions bot added the optimizer Optimizer rules label Jun 17, 2024
@github-actions github-actions bot added the sql SQL Planner label Jun 20, 2024
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 20, 2024
# Conflicts:
#	datafusion/expr/src/aggregate_function.rs
#	datafusion/expr/src/type_coercion/aggregates.rs
#	datafusion/functions-aggregate/src/lib.rs
#	datafusion/optimizer/src/common_subexpr_eliminate.rs
#	datafusion/physical-expr/src/aggregate/build_in.rs
#	datafusion/physical-expr/src/aggregate/mod.rs
#	datafusion/physical-expr/src/expressions/mod.rs
#	datafusion/proto/src/logical_plan/from_proto.rs
#	datafusion/proto/src/logical_plan/to_proto.rs
#	datafusion/proto/src/physical_plan/to_proto.rs
#	datafusion/proto/tests/cases/roundtrip_physical_plan.rs
#	datafusion/sqllogictest/test_files/window.slt
@dharanad
Copy link
Contributor Author

hello @jayzhan211 / @alamb
The following two test cases are failing

  • aggregates::tests::test_drop_cancel_with_groups
  • aggregates::tests::test_drop_cancel_without_groups

I'm encountering difficulties with understanding these tests cases. Their purpose and the specific behavior they target aren't entirely clear to me. Can you please provide some insights into these test cases? Specifically, it would be helpful to understand:

  • The context surrounding the addition of these tests.
  • The intended behavior they are designed to verify.

Your help would be greatly appreciated.

@jayzhan211
Copy link
Contributor

  • test_drop_cancel_with_groups

I think you can try create_aggregate_expr in datafusion/physical-expr-common/src/aggregate/mod.rs to create avg expression

@dharanad
Copy link
Contributor Author

  • test_drop_cancel_with_groups

I think you can try create_aggregate_expr in datafusion/physical-expr-common/src/aggregate/mod.rs to create avg expression

I am using the same to create avg expression. But it is still failing

@jayzhan211
Copy link
Contributor

  • test_drop_cancel_with_groups

I think you can try create_aggregate_expr in datafusion/physical-expr-common/src/aggregate/mod.rs to create avg expression

I am using the same to create avg expression. But it is still failing

The error here is because f32 is not coerced to f64, since the coercion occurs in logical optimization step. I think we can just change the type to f64 for test, since the test does not care about the things happen before getting Physical expression

@dharanad
Copy link
Contributor Author

  • test_drop_cancel_with_groups

I think you can try create_aggregate_expr in datafusion/physical-expr-common/src/aggregate/mod.rs to create avg expression

I am using the same to create avg expression. But it is still failing

The error here is because f32 is not coerced to f64, since the coercion occurs in logical optimization step. I think we can just change the type to f64 for test, since the test does not care about the things happen before getting Physical expression

I really appreciate you taking the time to help me with this. Using the debugger did the trick, and your insights help me understand the actual reason.

@dharanad dharanad marked this pull request as ready for review June 21, 2024 09:37
@dharanad
Copy link
Contributor Author

@jayzhan211 can you please help me with a review

alamb
alamb previously approved these changes Jun 21, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An epic PR @dharanad -- very nice. Thank you for this contribution and for the help / review @jayzhan211

@alamb alamb dismissed their stale review June 22, 2024 12:03

Let's address jayzhan211's comments first

dharanad added 4 commits June 22, 2024 19:49
# Conflicts:
#	datafusion/core/src/dataframe/mod.rs
#	datafusion/sqllogictest/test_files/optimizer_group_by_constant.slt
#	datafusion/sqllogictest/test_files/predicates.slt
#	datafusion/sqllogictest/test_files/tpch/q1.slt.part
#	datafusion/sqllogictest/test_files/window.slt
#	datafusion/substrait/tests/cases/consumer_integration.rs
@dharanad dharanad requested a review from jayzhan211 June 23, 2024 04:50
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dharanad
Copy link
Contributor Author

👍

Thank You @jayzhan211 for your time and guidance.

@dharanad dharanad requested a review from alamb June 23, 2024 07:39
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad and @jayzhan211

@alamb alamb merged commit fdd1e3d into apache:main Jun 23, 2024
23 checks passed
@alamb alamb added the api change Changes the API exposed to users of the crate label Jul 9, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* add avg udaf

* remove avg from expr

* add test stub

* migrate avg udaf

* change avg udaf signature
remove avg phy expr

* fix tests

* fix state_fields fn

* fix ut in phy-plan aggr

* refactor Average to Avg

* refactor Average to Avg

* fix type coercion tests

* fix example and logic tests

* fix py expr failing ut

* update docs

* fix failing tests

* formatting examples

* remove duplicate code and fix uts

* addressing PR comments

* add ut for logical avg window

* fix physical plan roundtrip_window test case
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 25, 2024
andygrove pushed a commit to apache/datafusion-python that referenced this pull request Jul 31, 2024
* chore: update datafusion deps

* feat: impl ExecutionPlan::static_name() for DatasetExec

This required trait method was added upstream [0] and recommends to simply forward to `static_name`.

[0]: apache/datafusion#10266

* feat: update first_value and last_value wrappers.

Upstream signatures were changed for the new new `AggregateBuilder` api [0].

This simply gets the code to work. We should better incorporate that API into `datafusion-python`.

[0] apache/datafusion#10560

* migrate count to UDAF

Builtin Count was removed upstream.

TBD whether we want to re-implement `count_star` with new API.

Ref: apache/datafusion#10893

* migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF

Ref: approx_distinct apache/datafusion#10851
Ref: approx_median apache/datafusion#10840
Ref: approx_percentile_cont and _with_weight apache/datafusion#10917

* migrate avg to UDAF

Ref: apache/datafusion#10964

* migrage corr to UDAF

Ref: apache/datafusion#10884

* migrate grouping to UDAF

Ref: apache/datafusion#10906

* add alias `mean` for UDAF `avg`

* migrate stddev to UDAF

Ref: apache/datafusion#10827

* remove rust alias for stddev

The python wrapper now provides stddev_samp alias.

* migrage var_pop to UDAF

Ref: apache/datafusion#10836

* migrate regr_* functions to UDAF

Ref: apache/datafusion#10898

* migrate bitwise functions to UDAF

The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930

* add missing variants for ScalarValue with todo

* fix typo in approx_percentile_cont

* add distinct arg to count

* comment out failing test

`approx_percentile_cont` is now returning a DoubleArray instead of an IntArray.

This may be a bug upstream; it requires further investigation.

* update tests to expect lowercase `sum` in query plans

This was changed upstream.

Ref: apache/datafusion#10831

* update ScalarType data_type map

* add docs dependency pickleshare

* re-implement count_star

* lint: ruff python lint

* lint: rust cargo fmt

* include name of window function in error for find_window_fn

* refactor `find_window_fn` for debug clarity

* search default aggregate functions by both name and aliases

The alias list no longer includes the name of the function.

Ref: apache/datafusion#10658

* fix markdown in find_window_fn docs

* parameterize test_window_functions

`first_value` and `last_value` are currently failing and marked as xfail.

* add test ids to test_simple_select tests marked xfail

* update find_window_fn to search built-ins first

The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior.
This allowed me to remove `marks=pytest.xfail` from the window tests.

* improve first_call and last_call use of the builder API

* remove trailing todos

* fix examples/substrait.py

* chore: remove explicit aliases from functions.rs

Ref: #779

* remove `array_fn!` aliases

* remove alias rules for `expr_fn_vec!`

* remove alias rules from `expr_fn!` macro

* remove unnecessary pyo3 var-arg signatures in functions.rs

* remove pyo3 signatures that provided defaults for first_value and last_value

* parametrize test_string_functions

* test regr_ function wrappers

Closes #778
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Average to UDAF
3 participants