Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track memory used by parquet writers. #11344

Closed
wiedld opened this issue Jul 9, 2024 · 2 comments · Fixed by #11345
Closed

Track memory used by parquet writers. #11344

wiedld opened this issue Jul 9, 2024 · 2 comments · Fixed by #11345
Assignees
Labels
enhancement New feature or request

Comments

@wiedld
Copy link
Contributor

wiedld commented Jul 9, 2024

Is your feature request related to a problem or challenge?

The encoding of parquet requires a non-trivial amount of memory buffering. During the execution of datafusion physical plans, parquet may be encoded using ParquetSink (e.g. COPY TO queries which output parquet). Currently we do not track ParquetSink's memory usage in the task context's memory pool.

Describe the solution you'd like

Start tracking the memory used during parquet encoding.

Describe alternatives you've considered

No response

Additional context

Recently, we extended several parquet interfaces to provide better estimates of the memory_usage during encoding. These memory usage estimates should be used to determine the appropriate memory reservations.

@wiedld wiedld added the enhancement New feature or request label Jul 9, 2024
@wiedld
Copy link
Contributor Author

wiedld commented Jul 9, 2024

Please assign to me. PR up shortly.

@alamb
Copy link
Contributor

alamb commented Jul 9, 2024

Someone else also saw similar issues in #11042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants