-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dictionary data type in array_to_string #10908
Support dictionary data type in array_to_string #10908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you su much for this contribution @EduardoVega
Can you please add some tests, ideally in sqllogicteset format?
Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest
Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files with the @appletreeisyellow 's reproduer from #10862
For example, this might be a good spit
## array_to_string (aliases: `list_to_string`, `array_join`, `list_join`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for this PR @EduardoVega -- very much appreciated
I still think this PR is not quite correct as it is operating on the dictionary rather than the logical values of the dictionary array.
I pushed a new commit to your branch with some additional tests that I think show wrong results.
I can help fix the code too, but figured I would leave the feedback in case you wanted to do so.
format!("could not cast {} to AnyDictionaryArray", arr.data_type()) | ||
) | ||
})?; | ||
compute_array_to_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some tests showing how this results in incorrect values.
I think you may be able to call to_string!
directly to iterate over the values of the dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback, let me figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb thanks for the tests, I think I have a better understanding of DictionaryArray.
Based on my understanding, I converted dictionary keys and values to something iterable. Then using the keys to access the values I got the logical array. I think tests are now showing correct results.
Finally, I am happy to work on any new feedback I receive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing @EduardoVega -- thank you so much -- this is quite an impressive first contribution. I didn't realize it was going to be quite as tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for working on this @EduardoVega -- it is quite a bit more complicated than I realized.
@@ -281,6 +283,49 @@ pub(super) fn array_to_string_inner(args: &[ArrayRef]) -> Result<ArrayRef> { | |||
|
|||
Ok(arg) | |||
} | |||
Dictionary(_, _) => { | |||
let any_dict_array = arr.as_any_dictionary_opt().ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @EduardoVega -- it is very impressive you got this to work
I am somewhat concerned that this is quite a bit of code (it actually handles dictionaries of any type, not just string dictionaries, and that it also supports arbitrary key types -- not just the integer types specified)
I am going to try and simplify this a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 5a5397b that I think simplifies the implementation a bit -- let me know what you think
($ARRAY_TYPE:ident) => {{ | ||
for x in downcast_arg!(any_dict_array.keys(), $ARRAY_TYPE) { | ||
if let Some(x) = x { | ||
keys.push(x.to_string().parse::<usize>().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works, but converts integers to strings just to parse them again -- I think we can just support string dictionaries and avoid this
Thanks again @EduardoVega and @Weijun-H |
* Support dictionary data type in array_to_string * Fix import * Some tests * Update datafusion/functions-array/src/string.rs Co-authored-by: Alex Huang <[email protected]> * Add some tests showing incorrect results * Get logical array * apply rust fmt * Simplify implementation, avoid panics --------- Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* Support dictionary data type in array_to_string * Fix import * Some tests * Update datafusion/functions-array/src/string.rs Co-authored-by: Alex Huang <[email protected]> * Add some tests showing incorrect results * Get logical array * apply rust fmt * Simplify implementation, avoid panics --------- Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* Support dictionary data type in array_to_string * Fix import * Some tests * Update datafusion/functions-array/src/string.rs Co-authored-by: Alex Huang <[email protected]> * Add some tests showing incorrect results * Get logical array * apply rust fmt * Simplify implementation, avoid panics --------- Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* Support dictionary data type in array_to_string * Fix import * Some tests * Update datafusion/functions-array/src/string.rs Co-authored-by: Alex Huang <[email protected]> * Add some tests showing incorrect results * Get logical array * apply rust fmt * Simplify implementation, avoid panics --------- Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #10862
Rationale for this change
Go to issue.
What changes are included in this PR?
Dictionary is added as a supported data type in
compute_array_to_string
, so that it can properly handle a Dictionary Array.Are these changes tested?
Yes
Are there any user-facing changes?