You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The take kernel for StringView and BinaryView is implemented using GenericByteViewArray::new() which is a safe constructor that does full utf8 validation for all non-inlined strings in the buffers. This is kind of silly, given we're not even constructing a new array, just copying the existing buffers arrays that are known to contain well-formed utf8 values.
In Vortex, I'm seeing this show up in the profiles for TPC-H queries as one of the more prominent items, in many cases causing a regression of up to 50% over Utf8.
Describe the solution you'd like
The take_byte kernel for Utf8/Binary arrays constructs an ArrayData instance and does not perform Utf8 validation, since we're taking from an already known-good Utf8 array.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Related to #6163
The
take
kernel for StringView and BinaryView is implemented usingGenericByteViewArray::new()
which is a safe constructor that does full utf8 validation for all non-inlined strings in the buffers. This is kind of silly, given we're not even constructing a new array, just copying the existing buffers arrays that are known to contain well-formed utf8 values.In Vortex, I'm seeing this show up in the profiles for TPC-H queries as one of the more prominent items, in many cases causing a regression of up to 50% over Utf8.
Describe the solution you'd like
The
take_byte
kernel for Utf8/Binary arrays constructs an ArrayData instance and does not perform Utf8 validation, since we're taking from an already known-good Utf8 array.Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: