Add support for StringView
and BinaryView
statistics in StatisticsConverter
#6164
Closed
Labels
enhancement
Any new improvement worthy of a entry in the changelog
good first issue
Good for newcomers
parquet
Changes to the parquet crate
Part of #6163
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
@efredine recently added support for extracting statistics from parquet files as arrays in #6046 using
StatisticsConverter
During development we have also added support for
StringViewArray
andBinaryViewArray
in #5374Currently there is no way to read StringViewArray and BinaryViewArray statistics and it actually panics if you try to read data page level statistics as I found on apache/datafusion#11723
Describe the solution you'd like
StringView
andBinaryView
unimplemented!
atarrow-rs/parquet/src/arrow/arrow_reader/statistics.rs
Line 946 in 2905ce6
The code is in https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_reader/statistics.rs
Describe alternatives you've considered
You can avoid the panic by following the model of this:
arrow-rs/parquet/src/arrow/arrow_reader/statistics.rs
Lines 465 to 467 in 2905ce6
Then, you can probably write a test followig the model of utf8 and binary
arrow-rs/parquet/src/arrow/arrow_reader/statistics.rs
Lines 1897 to 1917 in 2905ce6
arrow-rs/parquet/src/arrow/arrow_reader/statistics.rs
Lines 1956 to 1984 in 2905ce6
And then implement the missing pieces of code (use
StringViewBuilder
/BinaryViewBuilder
instead ofStringBuilder
/BinaryBuilder
)I have a hacky version in apache/datafusion#11753 that looks something like
Additional context
The text was updated successfully, but these errors were encountered: