Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: return "not supported" for COUNT DISTINCT with multiple arguments #11391

Merged
merged 3 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion datafusion/functions-aggregate/src/count.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ use arrow::{
buffer::BooleanBuffer,
};
use datafusion_common::{
downcast_value, internal_err, DataFusionError, Result, ScalarValue,
downcast_value, internal_err, not_impl_err, DataFusionError, Result, ScalarValue,
};
use datafusion_expr::function::StateFieldsArgs;
use datafusion_expr::{
Expand Down Expand Up @@ -138,6 +138,10 @@ impl AggregateUDFImpl for Count {
return Ok(Box::new(CountAccumulator::new()));
}

if acc_args.input_exprs.len() > 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking should it parser issue? Is there any SQL conventions supporting the multiple expressions for COUNT(DISTINCT)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked https://chicagobusinesspress.com/download_ancillary?book=22&ancillary=157 for SQL 2016 standard, I can only 1 col supported in syntax. I think we can go with a patch but followup has to be in sqlparser-rs

Copy link
Member Author

@jonahgao jonahgao Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, Spark and MySQL support this syntax, although MySQL states that this feature is not standard SQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of the dialects allow the syntax I think the sqlparser-rs AST will need to support multiple args.

This is kind of explained https://github.com/sqlparser-rs/sqlparser-rs?tab=readme-ov-file#syntax-vs-semantics (where what is a syntax error in some systems and a semantic one in others needs to be enforced as a semantic error)

Thus I think the right thing to do here is check and error in DataFusion, as @jonahgao 's PR does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, Spark supports it, I double checked it. Maybe it is SQL 2023 standard, but if any of dialects supports this syntax then parser is not expected to restrict it and this PR is correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "SQL Standard" in quotes -- because even though there is such a thing I don't think any database follows it exactly. It is more like "SQL Standard Suggestions-ish" or something 🤣

return not_impl_err!("COUNT DISTINCT with multiple arguments");
}

let data_type = acc_args.input_type;
Ok(match data_type {
// try and use a specialized accumulator if possible, otherwise fall back to generic accumulator
Expand Down
4 changes: 4 additions & 0 deletions datafusion/sqllogictest/test_files/aggregate.slt
Original file line number Diff line number Diff line change
Expand Up @@ -2038,6 +2038,10 @@ SELECT count(c1, c2) FROM test
----
3

# count(distinct) with multiple arguments
query error DataFusion error: This feature is not implemented: COUNT DISTINCT with multiple arguments
SELECT count(distinct c1, c2) FROM test

# count_null
query III
SELECT count(null), count(null, null), count(distinct null) FROM test
Expand Down
Loading