-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: return "not supported" for COUNT DISTINCT
with multiple arguments
#11391
Conversation
@@ -138,6 +138,10 @@ impl AggregateUDFImpl for Count { | |||
return Ok(Box::new(CountAccumulator::new())); | |||
} | |||
|
|||
if acc_args.input_exprs.len() > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking should it parser issue? Is there any SQL conventions supporting the multiple expressions for COUNT(DISTINCT)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked https://chicagobusinesspress.com/download_ancillary?book=22&ancillary=157 for SQL 2016 standard, I can only 1 col supported in syntax. I think we can go with a patch but followup has to be in sqlparser-rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of the dialects allow the syntax I think the sqlparser-rs AST will need to support multiple args.
This is kind of explained https://github.com/sqlparser-rs/sqlparser-rs?tab=readme-ov-file#syntax-vs-semantics (where what is a syntax error in some systems and a semantic one in others needs to be enforced as a semantic error)
Thus I think the right thing to do here is check and error in DataFusion, as @jonahgao 's PR does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, Spark supports it, I double checked it. Maybe it is SQL 2023 standard, but if any of dialects supports this syntax then parser is not expected to restrict it and this PR is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "SQL Standard" in quotes -- because even though there is such a thing I don't think any database follows it exactly. It is more like "SQL Standard Suggestions-ish" or something 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -138,6 +138,10 @@ impl AggregateUDFImpl for Count { | |||
return Ok(Box::new(CountAccumulator::new())); | |||
} | |||
|
|||
if acc_args.input_exprs.len() > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of the dialects allow the syntax I think the sqlparser-rs AST will need to support multiple args.
This is kind of explained https://github.com/sqlparser-rs/sqlparser-rs?tab=readme-ov-file#syntax-vs-semantics (where what is a syntax error in some systems and a semantic one in others needs to be enforced as a semantic error)
Thus I think the right thing to do here is check and error in DataFusion, as @jonahgao 's PR does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @jonahgao
…ments (apache#11391) * Minor: return "not supported" for COUNT DISTINCT with multiple arguments * update condition
…ments (apache#11391) * Minor: return "not supported" for COUNT DISTINCT with multiple arguments * update condition
…ments (apache#11391) * Minor: return "not supported" for COUNT DISTINCT with multiple arguments * update condition
…ments (apache#11391) * Minor: return "not supported" for COUNT DISTINCT with multiple arguments * update condition
Which issue does this PR close?
Closes #11303.
Rationale for this change
COUNT
with multiple arguments is a feature of Spark, and DataFusion handlesCOUNT(a, b, c)
well.But we do not have a DistinctCountAccumulator for multiple arguments.
COUNT(DISTINCT a, b, c)
was incorrectly executed asCOUNT(DISTINCT a)
.Therefore, rather than returning an error result, it is better to make it unsupported.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
Yes, although I don't think any users are using this syntax.