-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(11367): parquet writer defaults #34
fix(11367): parquet writer defaults #34
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wiedld -- this PR looks nice. I left some comments
Once we merge apache#11444 I think this would be a great PR to send upstream
"datafusion's default is None" | ||
); | ||
|
||
// TODO: matches once create WriterProps, but only due to parquet's override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment -- is there any action item here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another commit to re-work the code comments for this test. Hopefully that makes the misalignments more clear.
Please let me know @alamb if I should change anything else before moving this over into an upstream PR.
Have opened the upstream PR. |
* Postgres: enforce required `NUMERIC` type for `round` scalar function (#34) Includes initial support for dialects to override scalar functions unparsing * Document scalar_function_to_sql_overrides fn
WIP: first we need consensus on the desired defaults. Second, decide if the extern parquet's defaults can (or cannot) override the datafusion default.
Which issue does this PR close?
Closes apache#11367
Rationale for this change
When we switched from using the parquet's ArrowWriter (with options) to the parallelized parquet writer (with it's own options), we ran into unintended behaviors due to different default settings.
Here are the places where the current defaults differ:
† For these settings, datafusion has no default (None). However, once datafusion's ParquetOptions are used by the extern parquet (a.k.a. converted to parquet's ArrowWriterOptions) then it uses the extern parquet's defaults. Refer to the newly added tests.
.
Additionally, there are differences in the bloom filter configurations based upon partial definition (a.k.a. leaving some as default, and some as defined):
What changes are included in this PR?
The first commit is adding tests to define and demonstrate the differences in the defaults.
After discussion and consensus, we'll add other commits (as needed) for implementing the desired changes.
Are these changes tested?
Yes.
Are there any user-facing changes?
No API changes. Only potential future changes to alleviate unintended consequences.