Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: date_trunc support timezone #6818

Closed
wants to merge 2 commits into from

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Jul 1, 2023

Which issue does this PR close?

Closes #5976

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 1, 2023
@Weijun-H Weijun-H force-pushed the date_trunc-support-timezone branch from a38dee7 to a8f7b75 Compare July 1, 2023 13:20
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H

I think this PR preserves the timezone part of date_trunc which is good. However, it doens't seem to actually change date_trunc to take the timezones into account during the calculation.

self.volatility(),
),
BuiltinScalarFunction::DateTrunc => {
let time_zones = vec![
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @waitingkuo has some ideas about how to deal with timezones in function signatures.

It seems not quite right to have to list all timezones specially for date_trun only -- shouldn't the timezone logic apply to any function that takes timezones ? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure there is a better way to provide the timezone here. And wait for @waitingkuo suggestion.

Also, it should be extended to other functions using timezone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this issue also arises for decimal types, I suspect that the current pattern of explicitly listing the possible type signatures is going to require revisiting. We probably want to extend the pattern established in #6778 and allow functions to return a signature given a set of input types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alamb @Weijun-H @tustvold . i don't have the solution. actually i have some other pr blocked by this for now.
i wonder whether force it to be coerced to UTC Timestamp(someunit, Some("+00:00".into()) makes sense before we have better solution for what @tustvold suggested

Timestamp(Microsecond, _) => Ok(Timestamp(Microsecond, None)),
Timestamp(Millisecond, _) => Ok(Timestamp(Millisecond, None)),
Timestamp(Second, _) => Ok(Timestamp(Second, None)),
match &input_expr_types[1] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if any of the calculations actually have to change based on timezone (as wouldn't the end of theday being truncated depend on the timezone)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/arrow-datafusion/blob/02a470f6061cce8ee8e57f7af8a6a0e0ddc1571b/datafusion/physical-expr/src/datetime_expressions.rs#L343-L347

it truncates before applying timezone offset, which is inconsistent with postgresql

set timezone to '+08:00';
0 rows in set. Query took 0.026 seconds.

❯ select timestamptz '2000-01-01T00:00:00';
+-----------------------------+
| Utf8("2000-01-01T00:00:00") |
+-----------------------------+
| 2000-01-01T00:00:00+08:00   |
+-----------------------------+
1 row in set. Query took 0.002 seconds.

❯ select date_trunc('day', timestamptz '2000-01-01T00:00:00');
+-----------------------------------------------------+
| date_trunc(Utf8("day"),Utf8("2000-01-01T00:00:00")) |
+-----------------------------------------------------+
| 1999-12-31T00:00:00                                 |
+-----------------------------------------------------+
1 row in set. Query took 0.032 seconds.
willy=# select timestamptz '2000-01-01T00:00:00';
      timestamptz       
------------------------
 2000-01-01 00:00:00+08
(1 row)

willy=# select date_trunc('day', timestamptz '2000-01-01T00:00:00');
       date_trunc       
------------------------
 2000-01-01 00:00:00+08
(1 row)

self.volatility(),
),
BuiltinScalarFunction::DateTrunc => {
let time_zones = vec![
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alamb @Weijun-H @tustvold . i don't have the solution. actually i have some other pr blocked by this for now.
i wonder whether force it to be coerced to UTC Timestamp(someunit, Some("+00:00".into()) makes sense before we have better solution for what @tustvold suggested

Comment on lines +1289 to +1291
select date_trunc('hour', timestamptz '2000-01-01T00:00:00');
----
2000-01-01T00:00:00+08:00
Copy link
Contributor

@waitingkuo waitingkuo Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should add a test case that truncate the day

Timestamp(Microsecond, _) => Ok(Timestamp(Microsecond, None)),
Timestamp(Millisecond, _) => Ok(Timestamp(Millisecond, None)),
Timestamp(Second, _) => Ok(Timestamp(Second, None)),
match &input_expr_types[1] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/arrow-datafusion/blob/02a470f6061cce8ee8e57f7af8a6a0e0ddc1571b/datafusion/physical-expr/src/datetime_expressions.rs#L343-L347

it truncates before applying timezone offset, which is inconsistent with postgresql

set timezone to '+08:00';
0 rows in set. Query took 0.026 seconds.

❯ select timestamptz '2000-01-01T00:00:00';
+-----------------------------+
| Utf8("2000-01-01T00:00:00") |
+-----------------------------+
| 2000-01-01T00:00:00+08:00   |
+-----------------------------+
1 row in set. Query took 0.002 seconds.

❯ select date_trunc('day', timestamptz '2000-01-01T00:00:00');
+-----------------------------------------------------+
| date_trunc(Utf8("day"),Utf8("2000-01-01T00:00:00")) |
+-----------------------------------------------------+
| 1999-12-31T00:00:00                                 |
+-----------------------------------------------------+
1 row in set. Query took 0.032 seconds.
willy=# select timestamptz '2000-01-01T00:00:00';
      timestamptz       
------------------------
 2000-01-01 00:00:00+08
(1 row)

willy=# select date_trunc('day', timestamptz '2000-01-01T00:00:00');
       date_trunc       
------------------------
 2000-01-01 00:00:00+08
(1 row)

@Weijun-H Weijun-H force-pushed the date_trunc-support-timezone branch from a8f7b75 to ee70381 Compare July 7, 2023 22:59
@github-actions github-actions bot added the physical-expr Physical Expressions label Jul 7, 2023
@Weijun-H Weijun-H force-pushed the date_trunc-support-timezone branch from ee70381 to 382d2f4 Compare July 8, 2023 19:35
@alamb
Copy link
Contributor

alamb commented Jul 20, 2023

What is the current status of this PR? It seems to have gone stale

@Weijun-H
Copy link
Member Author

Weijun-H commented Jul 21, 2023

What is the current status of this PR? It seems to have gone stale

Hi, @alamb. This pr can truncate the day and hour now. I am not sure there is a better way to mention the timezone.

It seems not quite right to have to list all timezones specially for date_trun only -- shouldn't the timezone logic apply to any function that takes timezones ? 🤔

@alamb
Copy link
Contributor

alamb commented Sep 21, 2023

I think @wiedld has a solution in this PR: #7614

@alamb alamb closed this in #7614 Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_trunc drops timezone
4 participants