-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure statistic defaults in parquet writers are in sync #11656
Ensure statistic defaults in parquet writers are in sync #11656
Conversation
… setting is expected
…afusion_defaults twice (in both the original DF options and in the builder too); only should set once
…o true (default is false)
The clippy failure can likely be resolved by updating from main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld -- this looks good to me. 🙏
@@ -202,7 +202,7 @@ datafusion.execution.parquet.pruning true | |||
datafusion.execution.parquet.pushdown_filters false | |||
datafusion.execution.parquet.reorder_filters false | |||
datafusion.execution.parquet.skip_metadata true | |||
datafusion.execution.parquet.statistics_enabled NULL | |||
datafusion.execution.parquet.statistics_enabled page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an improvement -- it doesn't change the default value (NULL
means use arrow-rs defaults, which is page
), but now the default value isexplicit in the config settings
Also there is a test to ensure the defaults don't drift from the arrow-rs defaults accidentally
Which issue does this PR close?
Closes #11367
Rationale for this change
Final step to ensure that all default configuration settings, between the parquet session options and the arrow writer options, remain in alignment.
What changes are included in this PR?
Doc that the compression defaults are intentionally different.
Make the
statistics_enabled
defaults match.Fix the bloom filter tests.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.