Add `TrackedMemoryPool` with better error messages on exhaustion #11665

wiedld · 2024-07-26T02:36:02Z

Which issue does this PR close?

Rationale for this change

The OOMing error message returns information about the next incremental request of memory, and not for the biggest consumers of memory. As a result, we have spent time chasing OOMs in the wrong place.

What changes are included in this PR?

Includes a new MemoryPool implementation, which when (optionally) used with an inner MemoryPool will return a list of the top memory consumers on OOM.

This new TrackConsumersPool is optional, such that individual users can determine whether or not to use this capability (along with any associated overhead).

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, as a new TrackConsumersPool.

…ith top K of consumers

…is actually returning

wiedld · 2024-07-26T02:42:58Z

datafusion/execution/src/memory_pool/pool.rs

+        // Test: reports if new reservation causes error
+        // using the previous set size for other consumers
+        let mut r5 = MemoryConsumer::new("r5").register(&pool);
+        let expected = "Failed to allocate additional 150 bytes for r5 with 0 bytes already allocated - maximum available is 5. The top memory consumers (across reservations) are: r1 consumed 50 bytes, r3 consumed 20 bytes, r2 consumed 15 bytes";


With the proposed change in a follow PR, the final error message would read:

Failed to allocate additional 150 bytes for r5 with 0 bytes already allocated for this reservation - 5 bytes remain available for the total pool. The top memory consumers (across reservations) are: r1 consumed 50 bytes, r3 consumed 20 bytes, r2 consumed 15 bytes

Resources exhausted with top memory consumers (across reservations) are: r1 consumed 50 bytes, r3 consumed 20 bytes, r2 consumed 15 bytes. Error: Failed to allocate additional 150 bytes for r5 with 0 bytes already allocated for this reservation - 5 bytes remain available for the total pool

…different reservations

…mer with the same name, but different hash

wiedld · 2024-07-26T18:45:28Z

datafusion/execution/src/memory_pool/pool.rs

+/// Constructs a resources error based upon the individual [`MemoryReservation`].
+///
+/// The error references the `bytes already allocated` for the reservation,
+/// and not the total within the collective [`MemoryPool`],
+/// nor the total across multiple reservations with the same [`MemoryConsumer`].
 #[inline(always)]
 fn insufficient_capacity_err(


In a follow up PR, I would like to iterate on this error message.

Take the original/current:
Failed to allocate additional {} bytes for {} with {} bytes already allocated - maximum available is {}

And change into something like:
Failed to allocate additional {} bytes for {} with {} bytes already allocated for this reservation - {} bytes remain available for the total pool

This^^ suggestion is because this error message is inherently about only the reservation itself. Each reservation can request bytes from the pool -- but the pool itself only incr/decrements totals for the pool. It does NOT track each reservation, nor if multiple reservations have the same consumer.

As a result, the simplest solution is to: (a) update this error message, and (b) continue to rely upon each reservation to resize itself to zero upon drop (as it does).

I like the idea of updating the error message in a follow on PR

wiedld · 2024-07-26T18:47:59Z

datafusion/execution/src/memory_pool/pool.rs

+
+        // Test: see error message when no consumers recorded yet
+        let mut r0 = MemoryConsumer::new(same_name).register(&pool);
+        let expected = "Failed to allocate additional 150 bytes for foo with 0 bytes already allocated - maximum available is 100. The top memory consumers (across reservations) are: foo consumed 0 bytes";


With the proposed change in a follow PR, the final error message would read:

Failed to allocate additional 150 bytes for foo with 0 bytes already allocated for this reservation - 100 bytes remain available for the total pool. The top memory consumers (across reservations) are: foo consumed 0 bytes

Resources exhausted with top memory consumers (across reservations) are: foo consumed 0 bytes. Error: Failed to allocate additional 150 bytes for foo with 0 bytes already allocated for this reservation - 100 bytes remain available for the total pool

… child should be returning an OOM

wiedld · 2024-07-26T19:49:53Z

datafusion/execution/src/memory_pool/pool.rs

+        // API: multiple registrations using the same hashed consumer,
+        // will be recognized as the same in the TrackConsumersPool.


As noted in the struct docs, the TrackConsumersPool is all about the MemoryConsumer. Therefore the top K consumers are returned, even if a given consumer has multiple reservations. This is intended to make the error messages (and subsequent debugging) more useful by correctly reflecting the aggregated top K.

wiedld · 2024-07-26T19:55:04Z

datafusion/execution/src/memory_pool/pool.rs

+    }
+
+    #[test]
+    fn test_tracked_consumers_pool_deregister() {


This test is about an existing behavior. The current MemoryPool implementations will register/deregister on the MemoryConsumer level; whereas the bytes incr/decrement is on the MemoryReservation level.

This means that a consumer can be deregistered, even while the reservation still holds memory.

The new TrackConsumersPool has nothing to do with this existing behavior. However, this test is added in order to demonstrate what/how the error messages will read.

wiedld · 2024-07-29T15:21:19Z

@alamb -- I cannot reproduce these test failures locally, even with the same flags & tokio multithreaded tests enabled. Can you help?

alamb · 2024-07-29T16:31:07Z

@alamb -- I cannot reproduce these test failures locally, even with the same flags & tokio multithreaded tests enabled. Can you help?

I also could not reproduce it with

cargo test --lib  -p datafusion-execution

One way to debug such failures is to add additional debug logging to the tests -- right now the test failure simply says "failed" with a static message, but it doesn't say what the actual result was

For example you could potentially update your asserts like this:

diff --git a/datafusion/execution/src/memory_pool/pool.rs b/datafusion/execution/src/memory_pool/pool.rs
index 9ce14e41a..0df795a2b 100644
--- a/datafusion/execution/src/memory_pool/pool.rs
+++ b/datafusion/execution/src/memory_pool/pool.rs
@@ -494,12 +494,13 @@ mod tests {
         // using the previously set sizes for other consumers
         let mut r5 = MemoryConsumer::new("r5").register(&pool);
         let expected = "Failed to allocate additional 150 bytes for r5 with 0 bytes already allocated - maximum available is 5. The top memory consumers (across reservations) are: r1 consumed 50 bytes, r3 consumed 20 bytes, r2 consumed 15 bytes";
+        let actual_result = r5.try_grow(150);
         assert!(
             matches!(
-                r5.try_grow(150),
+                actual_result,
                 Err(DataFusionError::ResourcesExhausted(e)) if e.to_string().contains(expected)
             ),
-            "should provide list of top memory consumers"
+            "should provide list of top memory consumers, got {actual_result:?}",
         );
     }

And the message from your assertion failure should be more helpful for debugging

alamb · 2024-07-29T16:32:08Z

given the tests pass on amd64 maybe the difference is related to newlines or something in the message?

wiedld · 2024-07-29T19:44:27Z

given the tests pass on amd64 maybe the difference is related to newlines or something in the message?

We have a backtrace inserted into the error message. Specifically, the expected is:
Failed to allocate additional 150 bytes for r0 with 10 bytes already allocated - maximum available is 70. The top memory consumers (across reservations) are: r1 consumed 20 bytes, r0 consumed 10 bytes

When running locally, we get that^^ response.
Yet in CI, we get the backtrace:

 should provide proper error with both consumers, instead found Err(ResourcesExhausted(
"Failed to allocate additional 150 bytes for r0 with 10 bytes already allocated - maximum available is 70
\n\nbacktrace:    0: std::backtrace_rs::backtrace::libunwind::trace\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5\n   1: std::backtrace_rs::backtrace::trace_unsynchronized\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5\n   2: std::backtrace::Backtrace::create\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/backtrace.rs:331:13\n   3: get_back_trace\n             at /Users/runner/work/datafusion/datafusion/datafusion/common/src/error.rs:391:30\n   4: insufficient_capacity_err\n             at ./src/memory_pool/pool.rs:249:5\n   5: try_grow\n             at ./src/memory_pool/pool.rs:220:32\n   6: try_grow<datafusion_execution::memory_pool::pool::FairSpillPool>\n             at ./src/memory_pool/pool.rs:365:9\n   7: try_grow\n             at ./src/memory_pool/mod.rs:269:9\n   8: test_per_pool_type\n             at ./src/memory_pool/pool.rs:586:23\n   9: test_tracked_consumers_pool_deregister\n             at ./src/memory_pool/pool.rs:639:9\n  10: {closure#0}\n             at ./src/memory_pool/pool.rs:577:48\n  11: call_once<datafusion_execution::memory_pool::pool::tests::test_tracked_consumers_pool_deregister::{closure_env#0}, ()>\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5\n  12: core::ops::function::FnOnce::call_once\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5\n  13: test::__rust_begin_short_backtrace\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/test/src/lib.rs:625:18\n  14: test::run_test_in_process::{{closure}}\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/test/src/lib.rs:648:60\n  15: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panic/unwind_safe.rs:272:9\n  16: std::panicking::try::do_call\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:559:40\n  17: std::panicking::try\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:523:19\n  18: std::panic::catch_unwind\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panic.rs:149:14\n  19: test::run_test_in_process\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/test/src/lib.rs:648:27\n  20: test::run_test::{{closure}}\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/test/src/lib.rs:569:43\n  21: test::run_test::{{closure}}\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/test/src/lib.rs:599:41\n  22: std::sys_common::backtrace::__rust_begin_short_backtrace\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys_common/backtrace.rs:155:18\n  23: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/thread/mod.rs:542:17\n  24: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panic/unwind_safe.rs:272:9\n  25: std::panicking::try::do_call\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:559:40\n  26: std::panicking::try\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:523:19\n  27: std::panic::catch_unwind\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panic.rs:149:14\n  28: std::thread::Builder::spawn_unchecked_::{{closure}}\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/thread/mod.rs:541:30\n  29: core::ops::function::FnOnce::call_once{{vtable.shim}}\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/ops/function.rs:250:5\n  30: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:2063:9\n  31: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:2063:9\n  32: std::sys::pal::unix::thread::Thread::new::thread_start\n             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys/pal/unix/thread.rs:108:17\n  33: __pthread_joiner_wake\n
. The top memory consumers (across reservations) are: r1 consumed 20 bytes, r0 consumed 10 bytes"))

This means I need to construct the concatenated error message differently. Fixing...

…ror message" This reverts commit 09b20d2.

…he proper error bubbling (msg wrapping)

wiedld · 2024-07-29T20:48:13Z

CI was failing due to the concat of error messages. I fixed it two ways -- in order to demonstrate why the 2nd way is better.

1st way is to splice the error message, in order to insert the consumers. This inherently violates the nature of error bubbling IMO.
2nd way is to wrap the OOM error with the top consumers. IMO this is correct.

alamb

Thank you @wiedld -- I think this is looking good

I had some comments but overall I think this could be merged as is with the follow ons you identified (improve the messages)

I also think once we have this in, we should consider using this pool as the default (rather than GreedyPool) given its better user experience, but I think it would be good to make that proposal as a follow on PR as well

alamb · 2024-07-30T16:14:55Z

datafusion/execution/src/memory_pool/pool.rs

+/// Constructs a resources error based upon the individual [`MemoryReservation`].
+///
+/// The error references the `bytes already allocated` for the reservation,
+/// and not the total within the collective [`MemoryPool`],
+/// nor the total across multiple reservations with the same [`MemoryConsumer`].
 #[inline(always)]
 fn insufficient_capacity_err(


I like the idea of updating the error message in a follow on PR

datafusion/execution/src/memory_pool/pool.rs

alamb · 2024-07-30T16:21:23Z

datafusion/execution/src/memory_pool/pool.rs

+    }
+
+    /// The top consumers in a report string.
+    fn report_top(&self) -> String {


since top is only used for this report maybe it should be a parameter to report_top rather than a parameter on the pool 🤔

It seems like someone could want both the "top 10" consumers and "all consumers" from the same tracked pool but with the implementation they could only have one or the other

Is this the message you are proposing to fix in a follow on?

The use of TrackConsumersPool for error reporting (when passed as Arc<dyn MemoryPool>) is constrained by the trait definition. However, we could use the downcasted struct itself for runtime metrics as shown in this added commit. Is this what you were thinking?

alamb · 2024-07-30T16:24:28Z

datafusion/execution/src/memory_pool/pool.rs

+
+        // Test: will accumulate size changes per consumer, not per reservation
+        r1.grow(20);
+        let expected = "Resources exhausted with top memory consumers (across reservations) are: foo consumed 30 bytes. Error: Failed to allocate additional 150 bytes for foo with 20 bytes already allocated - maximum available is 70";


This is somewhat confusing that in the same message foo is reported to have two different allocations:

foo consumed 30 bytes

for foo with 20 bytes already allocated

Is it possible to rationalize the errors somehow make this less confusing?

I agree this is confusing. Because the error message for foo with 20 bytes already allocated is the error from the MemoryReservation, which has 20 bytes.

Whereas the MemoryConsumer happens to have 2 reservations -- with a total of 30 bytes allotted (across reservations).

I'm hoping the followup PR (with the message change for the reservation-specific error) will make this more clear. Does the below read better?

Resources exhausted with top memory consumers (across reservations) are: foo consumed 30 bytes. Error: Failed to allocate additional 150 bytes for foo with 20 bytes already allocated for this reservation - 70 bytes remain available for the total pool

Please feel free to suggest better wording for this^^. 🙏🏼

Co-authored-by: Andrew Lamb <[email protected]>

alamb

Thank you @wiedld -- I think this looks good to me

I will note as follow on tasks

Improving the default error message
Using the TrackConsumersPool as the default memory pool

alamb · 2024-07-31T14:25:22Z

datafusion/execution/src/memory_pool/pool.rs

+impl<I: MemoryPool> TrackConsumersPool<I> {
+    /// Creates a new [`TrackConsumersPool`].
+    ///
+    /// The `top` determines how many Top K [`MemoryConsumer`]s to include


wiedld added 5 commits July 25, 2024 19:12

feat(11523): TrackConsumersPool impl which includes errors messages w…

cf9bcca

…ith top K of consumers

test(11523): unit tests for TrackConsumersPool

d4e4da2

test(11523): integration test for tracked consumers oom message

92541c0

chore(11523): use nonzero usize

8941fa3

chore(11523): document the what the memory insufficient_capacity_err …

7ff1534

…is actually returning

github-actions bot added the core Core DataFusion crate label Jul 26, 2024

wiedld commented Jul 26, 2024

View reviewed changes

wiedld added 3 commits July 26, 2024 11:03

chore(11523): improve test failure coverage for TrackConsumersPool

f3905de

fix(11523): handle additive tracking of same hashed consumer, across …

ddc7700

…different reservations

refactor(11523): update error message to delineate the multiple consu…

e71a710

…mer with the same name, but different hash

wiedld commented Jul 26, 2024

View reviewed changes

wiedld changed the title ~~Provide actionable error messaging due to resource exhuastion.~~ Provide actionable error messaging due to resource exhaustion. Jul 26, 2024

wiedld added 3 commits July 26, 2024 12:39

test(11523): demonstrate the underlying pool behavior on deregister

9447368

chore: make explicit what the insufficient_capacity_err() logs

a8383fa

fix(11523): remove to_root() for the error, since the immediate inner…

1b1223f

… child should be returning an OOM

wiedld force-pushed the 11523/biggest-memory-consumers branch from 1795da6 to 1b1223f Compare July 26, 2024 19:39

wiedld commented Jul 26, 2024

View reviewed changes

chore(11523): add result to logging of failed CI tests

9a77f90

fix(11523): splice error message to get consumers prior to error message

09b20d2

wiedld force-pushed the 11523/biggest-memory-consumers branch from bf07b81 to 09b20d2 Compare July 29, 2024 20:20

wiedld added 2 commits July 29, 2024 13:24

Revert "fix(11523): splice error message to get consumers prior to er…

f405795

…ror message" This reverts commit 09b20d2.

fix(11523): fix without splicing error messages, and instead handle t…

f75764e

…he proper error bubbling (msg wrapping)

wiedld force-pushed the 11523/biggest-memory-consumers branch from fdc60de to f75764e Compare July 29, 2024 20:45

wiedld marked this pull request as ready for review July 29, 2024 21:33

alamb approved these changes Jul 30, 2024

View reviewed changes

alamb changed the title ~~Provide actionable error messaging due to resource exhaustion.~~ Add TrackedMemoryPool that has better error messages on exhaustion Jul 30, 2024

wiedld and others added 2 commits July 30, 2024 12:15

chore: update docs to explain purpose of TrackConsumersPool

c3ce60f

Co-authored-by: Andrew Lamb <[email protected]>

refactor(11523): enable TrackConsumersPool to be used in runtime metrics

c8c0196

alamb approved these changes Jul 31, 2024

View reviewed changes

alamb changed the title ~~Add TrackedMemoryPool that has better error messages on exhaustion~~ Add TrackedMemoryPool with better error messages on exhaustion Jul 31, 2024

alamb mentioned this pull request Jul 31, 2024

Resources exhausted errors are confusing return the biggest memory consumers. #11523

Closed

alamb merged commit 921c3b6 into apache:main Aug 1, 2024
25 checks passed

wiedld mentioned this pull request Aug 1, 2024

Improve OOM message when a single reservation request fails to get more bytes. #11771

Merged

alamb mentioned this pull request Aug 14, 2024

Debug dump of MemoryManager #6934

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `TrackedMemoryPool` with better error messages on exhaustion #11665

Add `TrackedMemoryPool` with better error messages on exhaustion #11665

wiedld commented Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024

wiedld Jul 26, 2024 •

edited

Loading

alamb Jul 30, 2024

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld commented Jul 29, 2024

alamb commented Jul 29, 2024

alamb commented Jul 29, 2024

wiedld commented Jul 29, 2024 •

edited

Loading

wiedld commented Jul 29, 2024 •

edited

Loading

alamb left a comment

alamb Jul 30, 2024

alamb Jul 30, 2024

wiedld Jul 30, 2024

alamb Jul 30, 2024

wiedld Jul 30, 2024 •

edited

Loading

wiedld Jul 30, 2024

alamb left a comment •

edited

Loading

alamb Jul 31, 2024

		// API: multiple registrations using the same hashed consumer,
		// will be recognized as the same in the TrackConsumersPool.

Add TrackedMemoryPool with better error messages on exhaustion #11665

Add TrackedMemoryPool with better error messages on exhaustion #11665

Conversation

wiedld commented Jul 26, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

wiedld Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

wiedld Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

wiedld Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

wiedld commented Jul 29, 2024

alamb commented Jul 29, 2024

alamb commented Jul 29, 2024

wiedld commented Jul 29, 2024 • edited Loading

wiedld commented Jul 29, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wiedld Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `TrackedMemoryPool` with better error messages on exhaustion #11665

Add `TrackedMemoryPool` with better error messages on exhaustion #11665

wiedld commented Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld Jul 26, 2024 •

edited

Loading

wiedld commented Jul 29, 2024 •

edited

Loading

wiedld commented Jul 29, 2024 •

edited

Loading

wiedld Jul 30, 2024 •

edited

Loading

alamb left a comment •

edited

Loading