Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distinct on decimal / interval #3873

Closed
xiangjinwu opened this issue Jul 14, 2022 · 3 comments · Fixed by #4600
Closed

distinct on decimal / interval #3873

xiangjinwu opened this issue Jul 14, 2022 · 3 comments · Fixed by #4600
Labels
type/bug Something isn't working

Comments

@xiangjinwu
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
In PostgreSQL:

test=# select distinct * from (values (1.0), (1.00)) as t;
 column1 
---------
     1.0
(1 row)

test=# select distinct * from (values (1.00), (1.0)) as t;
 column1 
---------
    1.00
(1 row)

test=# select distinct * from (values (interval '1' month), (interval '30' day)) as t;
 column1 
---------
 1 mon
(1 row)

test=# select distinct * from (values (interval '30' day), (interval '1' month)) as t;
 column1 
---------
 30 days
(1 row)

In RisingWave:

dev=> select distinct * from (values (1.0), (1.00)) as t;
   
---
 1
(1 row)

dev=> select distinct * from (values (1.00), (1.0)) as t;
   
---
 1
(1 row)

dev=> select distinct * from (values (interval '1' month), (interval '30' day)) as t;
                  
------------------
 1 mon 00:00:00
 30 days 00:00:00
(2 rows)

dev=> select distinct * from (values (interval '30' day), (interval '1' month)) as t;
                  
------------------
 30 days 00:00:00
 1 mon 00:00:00
(2 rows)

Expected behavior
Unsure. The PostgreSQL behavior is not preferable either, as it depends on the input order.

Additional context
Add any other context about the problem here.

@xiangjinwu xiangjinwu added the type/bug Something isn't working label Jul 14, 2022
@liurenjie1024
Copy link
Contributor

  1. I think we are right for the interval case, and pg is wrong.
  2. For decimal, I think we our answer is good enough.

@xiangjinwu
Copy link
Contributor Author

Although the difference between 1 month and 30 day feels more significant than the difference between 1.0 and 1.00, it seems we still need to considered them as equal, even in the context of distinct:

We have supported ordering and comparison for the interval type under PostgreSQL semantic, which means all interval values forms a total order, where 1 month equals to 30 day. We even wrapped real/float32 and double/float64 to assign total order for them. It is not an optional to say 1 month and 30 day are incomparable. And as a result of being equal, they should have the same hash value as well.

In #4575, we decided to output a single value for such distinct queries. But to avoid indeterministic due to input order, we will always output the canonical value from the equivalent class, which may be neither of the input value. This issue will be closed after #4600

@xiangjinwu
Copy link
Contributor Author

This also has an impact on correlated subquery, where the domain-calculation step leverages distinct but later on decimal/interval breaks x = y -> f(x) = f(y). This can lead to wrong (unexpected) query results. For example:

PostgreSQL:

test=# select v, (select scale(v)) from (values (1.0), (1.00)) as t(v);
  v   | scale 
------+-------
  1.0 |     1
 1.00 |     2
(2 rows)

test=# select v, (select v + date '2024-01-01') from (values (interval '30' day), (interval '1' month)) as t(v);
    v    |      ?column?       
---------+---------------------
 30 days | 2024-01-31 00:00:00
 1 mon   | 2024-02-01 00:00:00
(2 rows)

RisingWave:

dev=> select v, (select scale(v)) from (values (1.0), (1.00)) as t(v);
  v   | ?column? 
------+----------
  1.0 |        0
 1.00 |        0
(2 rows)

dev=> select v, (select v + date '2024-01-01') from (values (interval '30' day), (interval '1' month)) as t(v);
    v    |      ?column?       
---------+---------------------
 30 days | 2024-02-01 00:00:00
 1 mon   | 2024-02-01 00:00:00
(2 rows)

Just acknowledging this as a known limitation. Not proposing any fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants