Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skewness(x) in Aggregation function #12295

Closed
wants to merge 9 commits into from

Conversation

dharanad
Copy link
Contributor

@dharanad dharanad commented Sep 2, 2024

Which issue does this PR close?

Part of #12254
Closes #12249

Implemented by referring to https://github.com/duckdb/duckdb/blob/main/src/core_functions/aggregate/distributive/skew.cpp

Rationale for this change

What changes are included in this PR?

Added skewness UDAF.
Ref: https://duckdb.org/docs/sql/functions/aggregates#skewnessx

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added proto Related to proto crate functions labels Sep 2, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 2, 2024
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 2, 2024
@dharanad
Copy link
Contributor Author

dharanad commented Sep 2, 2024

Planning to add more logic tests.

@dharanad dharanad marked this pull request as ready for review September 2, 2024 18:34
@2010YOUY01
Copy link
Contributor

Planning to add more logic tests.

I think we can port tests from duckdb https://github.com/duckdb/duckdb/blob/main/test/sql/aggregate/aggregates/test_skewness.test

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dharanad the PR brings a lot of value.
Would you mind resolving conflicts?

skewness(expression)
```

#### Arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we dont have any examples in aggregate_functions.ms

@comphead
Copy link
Contributor

comphead commented Sep 4, 2024

Basically I'm feeling we have too much of user .md files, duplicating the functions description and poorly connected to each other which confuses a lot.

@alamb WDYT about having a single file for all SQL functions/scalars/expressions like in https://spark.apache.org/docs/latest/api/sql/index.html with description, examples

Btw Spark holds an internal meta repository that allows such doc to be regenerated from the code, which can be next step

@alamb
Copy link
Contributor

alamb commented Sep 5, 2024

Btw Spark holds an internal meta repository that allows such doc to be regenerated from the code, which can be next step

I agree

What I actually think would be ideal is exactly as @comphead describes and put all the documentation about the functions in the code, and then have a post processing step that auto generates the documentation

This is how the configuration setting documentation is created. Here is the script that does it:
https://github.com/apache/datafusion/blob/main/dev/update_config_docs.sh

BTW I think this is the direction @findepi is heading with #12266. I also think GlareDB did something similar, but I can't find the link now.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2024

I also wonder if we should consider where to draw the line on features to include (i.e. should we include skewness by default in the core?) -- especially as all aggregate functions use the same API, this is the kind of feature we could very easily keep outside -- either in its own crate or even own repo -- and then have other code integrate it in -- e.g. #11979

@alamb
Copy link
Contributor

alamb commented Sep 23, 2024

Perhaps we can port this function to https://github.com/datafusion-contrib/datafusion-functions-extra as well now that @dmitrybugakov has started that project

@alamb alamb marked this pull request as draft September 23, 2024 18:01
@dharanad
Copy link
Contributor Author

Closing this PR. Will move this to https://github.com/datafusion-contrib/datafusion-functions-extra as suggested

@dharanad dharanad closed this Sep 27, 2024
@dharanad dharanad deleted the feat/12249-support-skewness branch September 27, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support skewness(x) in Aggregation function
4 participants