-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change array_agg
to return null
on no input rather than empty list
#11299
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
query III | ||
WITH indices AS ( | ||
SELECT 1 AS idx UNION ALL | ||
SELECT 2 AS idx UNION ALL | ||
SELECT 3 AS idx UNION ALL | ||
SELECT 4 AS idx UNION ALL | ||
SELECT 5 AS idx | ||
) | ||
SELECT data.arr[indices.idx] as element, array_length(data.arr) as array_len, dummy | ||
FROM ( | ||
SELECT array_agg(distinct c2) as arr, count(1) as dummy FROM aggregate_test_100 | ||
) data | ||
CROSS JOIN indices | ||
ORDER BY 1 | ||
---- | ||
1 5 100 | ||
2 5 100 | ||
3 5 100 | ||
4 5 100 | ||
5 5 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrite it to the simpler one!
statement ok | ||
drop table t; | ||
|
||
# test with no values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add array_agg(distinct
case on empty table
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @jayzhan211 . Thank you for this PR
Also, thank you @findepi for the reviews 🙏
I double checked the answers in postgres and I believe after this PR DataFusion has consistent behavior
postgres=# create table t(a int, b float, c bigint);
ERROR: relation "t" already exists
postgres=# drop table t;
DROP TABLE
postgres=# create table t(a int, b float, c bigint);
CREATE TABLE
postgres=# insert into t values (1, 1.2, 2);
INSERT 0 1
postgres=# select array_agg(a) from t where a > 2;
array_agg
-----------
(1 row)
postgres=# select array_agg(b) from t where b > 3.1;
array_agg
-----------
(1 row)
postgres=# select array_agg(c), count(1) from t where c > 3;
array_agg | count
-----------+-------
| 0
(1 row)
postgres=# select array_agg(distinct a) from t where a > 3;
array_agg
-----------
(1 row)
array_agg
to return null
on no input rather than empty list
apache#11299) * change array agg semantic for empty result Signed-off-by: jayzhan211 <[email protected]> * return null Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * fix order sensitive Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix null Signed-off-by: jayzhan211 <[email protected]> * fix multi-phase case Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix clone Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
apache#11299) * change array agg semantic for empty result Signed-off-by: jayzhan211 <[email protected]> * return null Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * fix order sensitive Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix null Signed-off-by: jayzhan211 <[email protected]> * fix multi-phase case Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix clone Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
apache#11299) * change array agg semantic for empty result Signed-off-by: jayzhan211 <[email protected]> * return null Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * fix order sensitive Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix null Signed-off-by: jayzhan211 <[email protected]> * fix multi-phase case Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix clone Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
apache#11299) * change array agg semantic for empty result Signed-off-by: jayzhan211 <[email protected]> * return null Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * fix order sensitive Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix null Signed-off-by: jayzhan211 <[email protected]> * fix multi-phase case Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix clone Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
Which issue does this PR close?
As @findepi pointed out in #11274 (comment) that most of the aggregate function does not return non-null result if no row qualified. I double check the result in Postgres and Duckdb and find out they does not return empty list for
array_agg
. I think we can follow the behaviour as they did.I also hope this can make dealing with nullability simpler
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
The result of array agg is changed