Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug dump of MemoryManager #6934

Closed
alamb opened this issue Jul 12, 2023 · 5 comments
Closed

Debug dump of MemoryManager #6934

alamb opened this issue Jul 12, 2023 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Jul 12, 2023

Is your feature request related to a problem or challenge?

@JayjeetAtGithub and I are investigating improving memory performance for certain queries

When we hit the memory limit, we see different error messages. For example sometimes we see

External error: External error: Execution error for 'deduplicate batches'\ncaused by\nResources exhausted: Memory Exhausted while Sorting (DiskManager is disabled)", source: None })

And sometimes we see

External error: Resources exhausted: Failed to allocate additional 212912 bytes for GroupedHashAggregateStream[2] with 0 bytes already allocated - maximum available is 18446744073705376568"

We would like to know what operators are consuming the memory

I theorize the fact that we see different operators appear in the logs is due to the fact when we near the memory limit, any of the operators might be the "canary" that happened to be the operator that asked for memory next, rather than the one that was actually using it all)

Describe the solution you'd like

I would like some way to know how much each operator is consuming in a particular pool, and prior to returning an allocation error, entering a debug log with this information

The memory pool code is here: https://github.com/apache/arrow-datafusion/blob/main/datafusion/execution/src/memory_pool/pool.rs

I was imagining implementing Display for the pools like

impl Display for GreedyPool  {
 ...
}

Which would produce a report like this with the reservations sorted in descending order:

GreedyPool 25 allocations, 25630532 used, 332 free, 43942344 capacity
  321433: GroupedHashAggregateStream[0]
  1233: GroupedHashAggregateStream[2]
  24: Deduplicate

Then prior to returning a resources exhausted error:
https://github.com/apache/arrow-datafusion/blob/50135e8c039b82d32b57db12ca06d789e9cbea4c/datafusion/execution/src/memory_pool/pool.rs#L87

We would add a log message like

debug!("Pool Exhausted while trying to allocate {additional} bytes for {reservation} :\n{self}", self)

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Jul 12, 2023
@alamb
Copy link
Contributor Author

alamb commented Jul 12, 2023

I think this is a straightforward coding exercise and well specified, so marking it as good first issue

@TouchstoneTheDev
Copy link

Hello, I noticed this issue, also I have submitted a pull request #6934 that I believe addresses the issue. Please let me know if there is anything else I can do to help. Thank you!

@alamb
Copy link
Contributor Author

alamb commented Jul 13, 2023

Thank you @tanmay-veer -- I'll take a look today

@wiedld
Copy link
Contributor

wiedld commented Aug 14, 2024

Is the goal of this ticket to improve the OOM messages? Or to provide an API to get the top memory consumers (in the absence of an error)?

We have updated the OOM message to include the top 5 consumers (see PRs related to this issue). The code was implemented in such a way that it could be used for data dumps too using a public api.

Do we have any other action item needed to close this ticket?

@alamb
Copy link
Contributor Author

alamb commented Aug 14, 2024

I think the TrackedMemoryPool API in #11665 fulfills the goals of this ticket so closing. Thanks @wiedld

@alamb alamb closed this as completed Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
3 participants