Skip to content

Commit

Permalink
Upgrade Datafusion 40 (#771)
Browse files Browse the repository at this point in the history
* chore: update datafusion deps

* feat: impl ExecutionPlan::static_name() for DatasetExec

This required trait method was added upstream [0] and recommends to simply forward to `static_name`.

[0]: apache/datafusion#10266

* feat: update first_value and last_value wrappers.

Upstream signatures were changed for the new new `AggregateBuilder` api [0].

This simply gets the code to work. We should better incorporate that API into `datafusion-python`.

[0] apache/datafusion#10560

* migrate count to UDAF

Builtin Count was removed upstream.

TBD whether we want to re-implement `count_star` with new API.

Ref: apache/datafusion#10893

* migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF

Ref: approx_distinct apache/datafusion#10851
Ref: approx_median apache/datafusion#10840
Ref: approx_percentile_cont and _with_weight apache/datafusion#10917

* migrate avg to UDAF

Ref: apache/datafusion#10964

* migrage corr to UDAF

Ref: apache/datafusion#10884

* migrate grouping to UDAF

Ref: apache/datafusion#10906

* add alias `mean` for UDAF `avg`

* migrate stddev to UDAF

Ref: apache/datafusion#10827

* remove rust alias for stddev

The python wrapper now provides stddev_samp alias.

* migrage var_pop to UDAF

Ref: apache/datafusion#10836

* migrate regr_* functions to UDAF

Ref: apache/datafusion#10898

* migrate bitwise functions to UDAF

The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930

* add missing variants for ScalarValue with todo

* fix typo in approx_percentile_cont

* add distinct arg to count

* comment out failing test

`approx_percentile_cont` is now returning a DoubleArray instead of an IntArray.

This may be a bug upstream; it requires further investigation.

* update tests to expect lowercase `sum` in query plans

This was changed upstream.

Ref: apache/datafusion#10831

* update ScalarType data_type map

* add docs dependency pickleshare

* re-implement count_star

* lint: ruff python lint

* lint: rust cargo fmt

* include name of window function in error for find_window_fn

* refactor `find_window_fn` for debug clarity

* search default aggregate functions by both name and aliases

The alias list no longer includes the name of the function.

Ref: apache/datafusion#10658

* fix markdown in find_window_fn docs

* parameterize test_window_functions

`first_value` and `last_value` are currently failing and marked as xfail.

* add test ids to test_simple_select tests marked xfail

* update find_window_fn to search built-ins first

The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior.
This allowed me to remove `marks=pytest.xfail` from the window tests.

* improve first_call and last_call use of the builder API

* remove trailing todos

* fix examples/substrait.py

* chore: remove explicit aliases from functions.rs

Ref: #779

* remove `array_fn!` aliases

* remove alias rules for `expr_fn_vec!`

* remove alias rules from `expr_fn!` macro

* remove unnecessary pyo3 var-arg signatures in functions.rs

* remove pyo3 signatures that provided defaults for first_value and last_value

* parametrize test_string_functions

* test regr_ function wrappers

Closes #778
  • Loading branch information
Michael-J-Ward authored Jul 31, 2024
1 parent fd6b4df commit f580155
Show file tree
Hide file tree
Showing 12 changed files with 635 additions and 499 deletions.
69 changes: 36 additions & 33 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 8 additions & 8 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

[package]
name = "datafusion-python"
version = "39.0.0"
version = "40.0.0"
homepage = "https://datafusion.apache.org/python"
repository = "https://github.com/apache/datafusion-python"
authors = ["Apache DataFusion <[email protected]>"]
Expand All @@ -38,13 +38,13 @@ tokio = { version = "1.35", features = ["macros", "rt", "rt-multi-thread", "sync
rand = "0.8"
pyo3 = { version = "0.21", features = ["extension-module", "abi3", "abi3-py38"] }
arrow = { version = "52", feature = ["pyarrow"] }
datafusion = { version = "39.0.0", features = ["pyarrow", "avro", "unicode_expressions"] }
datafusion-common = { version = "39.0.0", features = ["pyarrow"] }
datafusion-expr = "39.0.0"
datafusion-functions-array = "39.0.0"
datafusion-optimizer = "39.0.0"
datafusion-sql = "39.0.0"
datafusion-substrait = { version = "39.0.0", optional = true }
datafusion = { version = "40.0.0", features = ["pyarrow", "avro", "unicode_expressions"] }
datafusion-common = { version = "40.0.0", features = ["pyarrow"] }
datafusion-expr = "40.0.0"
datafusion-functions-array = "40.0.0"
datafusion-optimizer = "40.0.0"
datafusion-sql = "40.0.0"
datafusion-substrait = { version = "40.0.0", optional = true }
prost = "0.12"
prost-types = "0.12"
uuid = { version = "1.9", features = ["v4"] }
Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ myst-parser
maturin
jinja2
ipython
pandas
pandas
pickleshare
2 changes: 1 addition & 1 deletion examples/substrait.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,4 @@

# Back to Substrait Plan just for demonstration purposes
# type(substrait_plan) -> <class 'datafusion.substrait.plan'>
substrait_plan = ss.Producer.to_substrait_plan(df_logical_plan)
substrait_plan = ss.Producer.to_substrait_plan(df_logical_plan, ctx)
Loading

0 comments on commit f580155

Please sign in to comment.