sync: make AtomicWaker panic safe #3689

udoprog · 2021-04-10T11:25:24Z

Make AtomicWaker panic safe by making sure that unwinding panics caused in user code restores the state of the AtomicWaker.

Motivation

Sometimes people (like me) want to make use of type which under the hood uses AtomicWaker in combination with std::panic::catch_unwind. Since the type itself does not implement the UnwindSafe family of traits, someone (like me) might be tempted to simply hope for the best and wrap the type with the AssertUnwindSafe escape hatch.

This might work well for a while, until a user comes around using a runtime (or a sub-scheduler) which happens to provide a Waker that can panic. And when it inevitably does, the system ends up in an interesting deadlock where new wakers won't be registered. This is obviously the users (my) fault for incorrect use of AssertUnwindSafe. But we can make the situation marginally better by avoiding the more egregious instances of where panic safety might trip you up.

Solution

Trace all code paths which might execute out-of-tokio provided code that might panic (e.g. into_waker or wake) and make sure that if a panic occurs the AtomicWaker it's restored to a well-defined, functional state. This also means it can implement RefUnwindSafe and UnwindSafe which is nice.

A viable alternative would be to poison the AtomicWaker if any panics are raised which might actually be "more correct", This would cause any future calls by any thread using the AtomicWaker to error or panic.

udoprog · 2021-04-10T11:30:32Z

Ugh. The build fails because Wake trait is not available in testing. I'll fix the new test if we do actually end up wanting this. Or maybe look into if we can't run test code on a newer Rust as discussed elsewhere.

Darksonn · 2021-04-11T06:57:50Z

You would need to upgrade clippy, miri and san. Feel free to upgrade them if you want, although do it in a separate PR.

carllerche · 2021-04-22T17:09:05Z

This is correct as far as I can tell. I would like to see a loom test ideally to cover the misc concurrency cases.

tokio/src/sync/task/atomic_waker.rs

Co-authored-by: Alice Ryhl <[email protected]>

tokio/src/sync/task/atomic_waker.rs

…state

udoprog · 2021-04-23T13:02:42Z

So there's two new things to note that has change in the process of writing the loom test.

PanicGuard now simply unsets the REGISTERING bit, so that it can preserve any prior wake's. This means that any wakes called prior to it panicking will be propagated in future calls when registering new wakers. Instead of being clobbered by forcing the state back to WAITING. I realized this was possible when considering what combination of effects you could observe out of an AtomicWaker. We could revert it to simply force it back to WAITING. But this change feels more correct to me.

If the loom test ever fails in the future it will spam a million lines caused by the intentional panic - one for each iteration. And I do not know how to avoid this since the only way to prevent it is to install a panic hook. See https://github.com/tokio-rs/tokio/pull/3689/files#diff-7a5023d8b8cdb6f299f962201022b3934e6445806adb30b3f12c810cae4297a6R63

Darksonn · 2021-05-05T12:19:52Z

tokio/src/sync/task/atomic_waker.rs

+        impl Drop for PanicGuard<'_> {
+            fn drop(&mut self) {
+                // On panics, we want to unset the REGISTERING state which will
+                // preserve any potential WAKING state for future calls to
+                // register.
+                let _ = self.0.fetch_and(!REGISTERING, AcqRel);
+            }
+        }


I don't think this is correct. If the state becomes WAKING due to this, then who sets it back to WAITING? It seems like the wake lock would just be held indefinitely.

Correct, so there's a chance it would get stuck in a WAKING state instead due to this.

This is being discussed on Discord, but either revert to old behavior or cope with it somehow. But in the current state at least, no merge.

I think that correct behavior would be to:

Replace the current waker with None. Store the old waker for now.

Call waker.into_waker().

Set the current waker to Some(new_waker).

Do all of the other stuff in the function.

Just before returning from do_register (even if by panic), drop the old waker, if any. Do nothing special if it panics.

In principle this can double-panic in step five, but I do not think that this is a problem since it is normal that things get their destructor called during panics.

Darksonn · 2021-05-05T13:36:41Z

tokio/src/sync/task/atomic_waker.rs

-                    self.waker.with_mut(|t| *t = Some(waker.into_waker()));
+                    self.waker.with_mut(|t| *t = Some(new_waker));


I just realized that this can panic too, since the old waker may have a destructor.

udoprog

Minor nit and questions. Using catch_unwind feels better to me since there's less complexity to deal with unwind behavior. Looks good to me (note that I can't approve since I'm author of PR).

udoprog · 2021-08-26T16:29:25Z

tokio/src/sync/task/atomic_waker.rs

-                        Ok(_) => {}
+                        Ok(_) => {
+                            // We don't want to give the caller the panic if it
+                            // was someone else who put in that waker.


Couldn't this end in a situation where a panic in the Drop impl of a Waker is ignored, and if so is that OK?

My general stance towards panics is that they should be propagated somewhere - otherwise errors which cause them might end up going unnoticed and contribute to other unpredictable side effects.

Well, that's the question really. There are certainly some places we don't want panics propagating, e.g. inside the runtime. The panic is printed to the console even if we drop it.

udoprog · 2021-08-26T16:29:56Z

tokio/src/sync/task/atomic_waker.rs

+                            }
+
+                            // We don't want to give the caller the panic if it
+                            // was someone else who put in that waker.


Same Q here.

udoprog · 2021-08-26T16:33:03Z

tokio/src/sync/task/atomic_waker.rs

@@ -171,15 +175,35 @@ impl AtomicWaker {
    where
        W: WakerRef,
    {
+        fn catch_unwind<F: FnOnce() -> R, R>(f: F) -> std::thread::Result<R> {
+            std::panic::catch_unwind(AssertUnwindSafe(f))


I think Waker's are UnwindSafe and the only things captured in closures below, so it might be possible to add that as a bound to F: UnwindSafe + FnOnce() -> R instead of AssertUnwindSafe?

Ah, it might have been a previous version which was unhappy. I can try changing it to use catch_unwind directly.

# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212

sync: make AtomicWaker panic safe

cfc76cb

Darksonn added A-tokio Area: The main tokio crate M-sync Module: tokio/sync labels Apr 11, 2021

udoprog added 2 commits April 12, 2021 07:44

sync: Replace test with a custom non-panicking Waker

3fc1662

sync: Fix formatting

5d38c93

taiki-e mentioned this pull request Apr 18, 2021

Should types like AtomicWaker and AbortHandle be declared UnwindSafe? rust-lang/futures-rs#2211

Open

Darksonn reviewed Apr 22, 2021

View reviewed changes

tokio/src/sync/task/atomic_waker.rs Outdated Show resolved Hide resolved

udoprog and others added 2 commits April 22, 2021 19:27

sync: grammar and sparser docs

ebef874

Update tokio/src/sync/task/atomic_waker.rs

445a451

Co-authored-by: Alice Ryhl <[email protected]>

Darksonn reviewed Apr 22, 2021

View reviewed changes

tokio/src/sync/task/atomic_waker.rs Outdated Show resolved Hide resolved

udoprog added 5 commits April 23, 2021 08:49

sync: store not swap

9a165b1

sync: woops, store only makes sense w/ Release

4d9ce19

sync: just unset the REGISTERING state in AtomicWaker

3630c14

sync: add loom test that checks a panicking waker preserves a WAKING …

281778d

…state

sync: remove rogue SeqCst

6b390e9

Darksonn reviewed May 5, 2021

View reviewed changes

Darksonn added 5 commits August 19, 2021 10:02

Update panic safety of registration

fc9bc21

fix CI

771abc8

Merge 'master' into 'panic-safe-atomic-waker'

e0d4981

rustfmt

9b2760d

That was apparently important

ae2e105

udoprog commented Aug 26, 2021

View reviewed changes

Darksonn approved these changes Nov 2, 2021

View reviewed changes

Darksonn merged commit 09b770c into tokio-rs:master Nov 2, 2021

udoprog deleted the panic-safe-atomic-waker branch November 2, 2021 12:44

hawkw mentioned this pull request Nov 15, 2021

chore: prepare Tokio v1.14.0 #4234

Merged

abonander mentioned this pull request May 7, 2022

Enumerate types which are !UnwindSafe on purpose, fix the rest #4657

Open

taiki-e mentioned this pull request Feb 28, 2023

Missed wake call using .shared() rust-lang/futures-rs#2706

Open

taiki-e mentioned this pull request Sep 10, 2023

Implement the UnwindSafe/RefUnwindSafe traits on channels rust-lang/futures-rs#2772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: make AtomicWaker panic safe #3689

sync: make AtomicWaker panic safe #3689

udoprog commented Apr 10, 2021 •

edited

Loading

udoprog commented Apr 10, 2021 •

edited

Loading

Darksonn commented Apr 11, 2021

carllerche commented Apr 22, 2021

udoprog commented Apr 23, 2021 •

edited

Loading

Darksonn May 5, 2021

udoprog May 5, 2021

Darksonn May 5, 2021

Darksonn May 5, 2021

udoprog left a comment

udoprog Aug 26, 2021 •

edited

Loading

Darksonn Aug 26, 2021

udoprog Aug 26, 2021

udoprog Aug 26, 2021

Darksonn Aug 26, 2021

		self.waker.with_mut(\|t\| *t = Some(waker.into_waker()));
		self.waker.with_mut(\|t\| *t = Some(new_waker));

sync: make AtomicWaker panic safe #3689

sync: make AtomicWaker panic safe #3689

Conversation

udoprog commented Apr 10, 2021 • edited Loading

Motivation

Solution

udoprog commented Apr 10, 2021 • edited Loading

Darksonn commented Apr 11, 2021

carllerche commented Apr 22, 2021

udoprog commented Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

udoprog left a comment

Choose a reason for hiding this comment

udoprog Aug 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

udoprog commented Apr 10, 2021 •

edited

Loading

udoprog commented Apr 10, 2021 •

edited

Loading

udoprog commented Apr 23, 2021 •

edited

Loading

udoprog Aug 26, 2021 •

edited

Loading