-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Utf8View
for string function bit_length()
#13221
Support Utf8View
for string function bit_length()
#13221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @austin362667 🙏
Can you please add a test to https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string/string_query.slt.part ?
The structure is explained in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string/README.md
certainly, thanks for review! |
@@ -1096,3 +1096,20 @@ Rapha Raphael NULL datafusionДатаФус datafusionДатаФусион NULL | |||
und under_score NULL un iść core un iść core NULL | |||
p percent NULL pan Tadeusz ma iść w kąt pan Tadeusz ma iść w kąt NULL | |||
NULL NULL NULL NULL NULL NULL | |||
|
|||
# TODO: Support Utf8View for bit_length array string function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment -- it seems like bit_length is working on Utf8View in datafusion/sqllogictest/test_files/string/string_view.slt
If you just uncomment this code out that would mean we are covering bit_length for all the string types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb
As I understand it, applying bit_length()
to a column will currently fail because arrow::compute::kernels::length::bit_length
does not yet support UTF-8 views (depends on PR in arrow-rs
). However, it's still used in the following code path:
ColumnarValue::Array(v) => Ok(ColumnarValue::Array(bit_length(v.as_ref())?)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cover other string types, we can add this test to each string testing file before apache/arrow-rs#6671.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @austin362667
query IIII | ||
SELECT | ||
BIT_LENGTH(arrow_cast('Andrew', 'Utf8View')), | ||
BIT_LENGTH(arrow_cast('datafusion数据融合', 'Utf8View')), | ||
BIT_LENGTH(arrow_cast('💖', 'Utf8View')), | ||
BIT_LENGTH(arrow_cast('josé', 'Utf8View')) | ||
; | ||
---- | ||
48 176 32 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this test to string/string_literal.slt
and then add similar tests for other string types as well (UTF8, LargeUTF8, and DictionaryString).
@@ -1096,3 +1096,20 @@ Rapha Raphael NULL datafusionДатаФус datafusionДатаФусион NULL | |||
und under_score NULL un iść core un iść core NULL | |||
p percent NULL pan Tadeusz ma iść w kąt pan Tadeusz ma iść w kąt NULL | |||
NULL NULL NULL NULL NULL NULL | |||
|
|||
# TODO: Support Utf8View for bit_length array string function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cover other string types, we can add this test to each string testing file before apache/arrow-rs#6671.
; | ||
---- | ||
48 176 32 40 | ||
|
||
query error DataFusion error: Arrow error: Compute error: bit_length not supported for Utf8View |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a TODO comment here and file an issue to track apache/arrow-rs#6671. After upgrading to the corresponding arrow-rs version, we should address the TODO comment in string/string_query.slt.part#L1100
.
@austin362667 any chance you have some time to address @goldmedal 's suggestions? |
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
fe442e3
to
82ed616
Compare
Thanks @goldmedal @alamb for the review. 🙏 Just addressed these issues |
Signed-off-by: Austin Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @austin362667 and @alamb @jayzhan211 for reviewing. Although I think the tests can be improved, I can help to do that in the follow-up PR. Let's merge this PR first.
* Support `Utf8View` for string function `bit_length()` Signed-off-by: Austin Liu <[email protected]> * Add scalar test case Signed-off-by: Austin Liu <[email protected]> * Refine tests Signed-off-by: Austin Liu <[email protected]> * Fix wrong format Signed-off-by: Austin Liu <[email protected]> --------- Signed-off-by: Austin Liu <[email protected]>
Which issue does this PR close?
Closes #13195
Rationale for this change
Thanks to @jayzhan211 , he noticed following issue which string scalar function
bit_length()
doesn't supportUtf8View
type:What changes are included in this PR?
Update
bit_length()
scalar function to supportUtf8View
Are these changes tested?
Yes for scalar function. No for array function as it's still PR in apache/arrow-rs#6671
Are there any user-facing changes?