-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: window function range offset should be long instead of int #733
Conversation
val df = | ||
Seq((1L, "1"), (1L, "1"), (2147483650L, "1"), (3L, "2"), (2L, "1"), (2147483650L, "2")) | ||
.toDF("key", "value") | ||
|
||
checkAnswer( | ||
df.select( | ||
$"key", | ||
count("key").over( | ||
sum("key").over( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this test fails before this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't support count
so change to sum
. Previously, even if the aggregate function is not supported, it still goes to the window frame code. In this PR, I return None directly if aggregate function is not supported.
throw new IllegalArgumentException( | ||
"Unsupported data type for window function row/range offset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will fail the query directly instead of falling back to Spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to return None
to fall back to Spark
2e03816
to
1f8a29e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #733 +/- ##
=============================================
- Coverage 55.16% 34.16% -21.00%
- Complexity 857 881 +24
=============================================
Files 109 112 +3
Lines 10542 42949 +32407
Branches 2010 9491 +7481
=============================================
+ Hits 5815 14673 +8858
- Misses 3714 25274 +21560
- Partials 1013 3002 +1989 ☔ View full report in Codecov by Sentry. |
2777d18
to
0d84fd2
Compare
WindowFrameUnits::Rows => { | ||
WindowFrameBound::Preceding(ScalarValue::UInt64(None)) | ||
} | ||
WindowFrameUnits::Range | WindowFrameUnits::Groups => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use Groups
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use Groups
, but if we don't specify this case, UpperFrameBoundStruct::UnboundedFollowing(_) => match units
fails with pattern datafusion_expr::WindowFrameUnits::Groups not covered
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add it as separate pattern branch and return error?
if (partitionColumnNames.toSet != orderColumnNames.toSet) { | ||
withInfo(op, "Partitioning and sorting specifications do not match") | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check the partition column and order column one by one instead of a set? I'm not sure if (PARTITION BY k, v ORDER BY v, k)
work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
94c7739
to
f801106
Compare
f801106
to
92b5c58
Compare
Merged. Thanks @viirya @kazuyukitanimura |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?