-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resources exhausted errors are confusing return the biggest memory consumers. #11523
Comments
I agree this message is quite confusing There is a related idea here I think #6934 I think @westonpace mentioned something similar when they were debugging OOM issues too |
The message is quite confusing and we can probably make it better, mentioning requested, used, allocated and remaining mem. But I'm not sure if there is an easy way to what takes the mem, as memory reservation has no idea about the caller just the mem pool name. |
Well timed comment @comphead . I was just linking a previous (closed) PR which started down the path of tracking memory consumers (for the purposes of debug dumping). I'm not proposing that as the solution per se, rather giving us an idea of prior work. |
I believe @wiedld plans to look at this one later this week. We have seen it with our customers and it is quite confusing. |
@wiedld has an initial PR to add Here is my proposal for what remains to close this issue
Thoughts on error messagesMessage today:
@wiedld 's proposal on https://github.com/apache/datafusion/pull/11665/files#r1693465283
I think the new proposal is better as it is clearer what is going on Thoughts on changing the default poolThe the default is set here:
I believe we should change the default pool in DataFusion to be a The only potential issue with changing the default is that the tracking has some additional runtime overhead -- therefore we should run benchmark tests to ensure there is no performance regression. Also, a nice part of @wiedld 's |
Is your feature request related to a problem or challenge?
When asking for more memory via the memory reservation, the error returned from the underlying memory pool focuses on that specific request. As a result, when debugging a resource exhausted error, we get an error message that looks something like:
This^^ error is about what next incremental request failed to get more memory, and not about what is using the most memory. As a result, additional dev time has to be spent to (a) at best, track down the actual high memory consumer and (b) at worst, wasted time chasing the wrong memory consumer.
Describe the solution you'd like
One possible solution is to have the error message returned the Top K memory consumers.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: