Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change array_agg to return null on no input rather than empty list #11299

Merged
merged 13 commits into from
Jul 10, 2024

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Jul 6, 2024

Which issue does this PR close?

As @findepi pointed out in #11274 (comment) that most of the aggregate function does not return non-null result if no row qualified. I double check the result in Postgres and Duckdb and find out they does not return empty list for array_agg. I think we can follow the behaviour as they did.

I also hope this can make dealing with nullability simpler

  1. Check the nullability of each (aggregate) function
  2. Make sure nullable is helpful for logical optimizer.

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

The result of array agg is changed

@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jul 6, 2024
@jayzhan211 jayzhan211 changed the title Change array agg semantic for empty result from empty list to empty row Change array agg result from empty list to empty row if no row qualifed Jul 6, 2024
@jayzhan211 jayzhan211 changed the title Change array agg result from empty list to empty row if no row qualifed Change array agg result from empty list to null if no row qualifed Jul 6, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 6, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the core Core DataFusion crate label Jul 6, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review July 6, 2024 07:49
Comment on lines -1761 to -1780
query III
WITH indices AS (
SELECT 1 AS idx UNION ALL
SELECT 2 AS idx UNION ALL
SELECT 3 AS idx UNION ALL
SELECT 4 AS idx UNION ALL
SELECT 5 AS idx
)
SELECT data.arr[indices.idx] as element, array_length(data.arr) as array_len, dummy
FROM (
SELECT array_agg(distinct c2) as arr, count(1) as dummy FROM aggregate_test_100
) data
CROSS JOIN indices
ORDER BY 1
----
1 5 100
2 5 100
3 5 100
4 5 100
5 5 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrite it to the simpler one!

statement ok
drop table t;

# test with no values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add array_agg(distinct case on empty table

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as draft July 8, 2024 23:51
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Jul 9, 2024
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 9, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review July 9, 2024 08:56
@jayzhan211 jayzhan211 marked this pull request as draft July 9, 2024 09:54
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review July 9, 2024 15:29
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @jayzhan211 . Thank you for this PR

Also, thank you @findepi for the reviews 🙏

I double checked the answers in postgres and I believe after this PR DataFusion has consistent behavior

postgres=# create table t(a int, b float, c bigint);
ERROR:  relation "t" already exists
postgres=# drop table t;
DROP TABLE
postgres=# create table t(a int, b float, c bigint);
CREATE TABLE
postgres=# insert into t values (1, 1.2, 2);
INSERT 0 1
postgres=# select array_agg(a) from t where a > 2;
 array_agg
-----------

(1 row)

postgres=# select array_agg(b) from t where b > 3.1;
 array_agg
-----------

(1 row)

postgres=# select array_agg(c), count(1) from t where c > 3;
 array_agg | count
-----------+-------
           |     0
(1 row)

postgres=# select array_agg(distinct a) from t where a > 3;
 array_agg
-----------

(1 row)

@alamb alamb changed the title Change array agg result from empty list to null if no row qualifed Change array_agg to return null on no input rather than empty list Jul 10, 2024
@jayzhan211 jayzhan211 merged commit d3f6372 into apache:main Jul 10, 2024
24 checks passed
@jayzhan211 jayzhan211 deleted the array-agg-no-row branch July 10, 2024 23:32
@jayzhan211
Copy link
Contributor Author

Thanks @alamb @findepi

Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Jul 12, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
apache#11299)

* change array agg semantic for empty result

Signed-off-by: jayzhan211 <[email protected]>

* return null

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* fix order sensitive

Signed-off-by: jayzhan211 <[email protected]>

* fix test

Signed-off-by: jayzhan211 <[email protected]>

* add more test

Signed-off-by: jayzhan211 <[email protected]>

* fix null

Signed-off-by: jayzhan211 <[email protected]>

* fix multi-phase case

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* fix clone

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants