-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync: make AtomicWaker panic safe #3689
Conversation
Ugh. The build fails because |
You would need to upgrade clippy, miri and san. Feel free to upgrade them if you want, although do it in a separate PR. |
This is correct as far as I can tell. I would like to see a loom test ideally to cover the misc concurrency cases. |
Co-authored-by: Alice Ryhl <[email protected]>
So there's two new things to note that has change in the process of writing the loom test.
If the loom test ever fails in the future it will spam a million lines caused by the intentional panic - one for each iteration. And I do not know how to avoid this since the only way to prevent it is to install a panic hook. See https://github.com/tokio-rs/tokio/pull/3689/files#diff-7a5023d8b8cdb6f299f962201022b3934e6445806adb30b3f12c810cae4297a6R63 |
tokio/src/sync/task/atomic_waker.rs
Outdated
impl Drop for PanicGuard<'_> { | ||
fn drop(&mut self) { | ||
// On panics, we want to unset the REGISTERING state which will | ||
// preserve any potential WAKING state for future calls to | ||
// register. | ||
let _ = self.0.fetch_and(!REGISTERING, AcqRel); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. If the state becomes WAKING
due to this, then who sets it back to WAITING
? It seems like the wake lock would just be held indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, so there's a chance it would get stuck in a WAKING
state instead due to this.
This is being discussed on Discord, but either revert to old behavior or cope with it somehow. But in the current state at least, no merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that correct behavior would be to:
- Replace the current waker with
None
. Store the old waker for now. - Call
waker.into_waker()
. - Set the current waker to
Some(new_waker)
. - Do all of the other stuff in the function.
- Just before returning from
do_register
(even if by panic), drop the old waker, if any. Do nothing special if it panics.
In principle this can double-panic in step five, but I do not think that this is a problem since it is normal that things get their destructor called during panics.
tokio/src/sync/task/atomic_waker.rs
Outdated
self.waker.with_mut(|t| *t = Some(waker.into_waker())); | ||
self.waker.with_mut(|t| *t = Some(new_waker)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this can panic too, since the old waker may have a destructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit and questions. Using catch_unwind
feels better to me since there's less complexity to deal with unwind behavior. Looks good to me (note that I can't approve since I'm author of PR).
Ok(_) => {} | ||
Ok(_) => { | ||
// We don't want to give the caller the panic if it | ||
// was someone else who put in that waker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this end in a situation where a panic in the Drop impl of a Waker is ignored, and if so is that OK?
My general stance towards panics is that they should be propagated somewhere - otherwise errors which cause them might end up going unnoticed and contribute to other unpredictable side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's the question really. There are certainly some places we don't want panics propagating, e.g. inside the runtime. The panic is printed to the console even if we drop it.
} | ||
|
||
// We don't want to give the caller the panic if it | ||
// was someone else who put in that waker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same Q here.
@@ -171,15 +175,35 @@ impl AtomicWaker { | |||
where | |||
W: WakerRef, | |||
{ | |||
fn catch_unwind<F: FnOnce() -> R, R>(f: F) -> std::thread::Result<R> { | |||
std::panic::catch_unwind(AssertUnwindSafe(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Waker's are UnwindSafe
and the only things captured in closures below, so it might be possible to add that as a bound to F: UnwindSafe + FnOnce() -> R
instead of AssertUnwindSafe
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it might have been a previous version which was unhappy. I can try changing it to use catch_unwind
directly.
# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212
# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212
# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212
Make AtomicWaker panic safe by making sure that unwinding panics caused in user code restores the state of the AtomicWaker.
Motivation
Sometimes people (like me) want to make use of type which under the hood uses AtomicWaker in combination with
std::panic::catch_unwind
. Since the type itself does not implement theUnwindSafe
family of traits, someone (like me) might be tempted to simply hope for the best and wrap the type with theAssertUnwindSafe
escape hatch.This might work well for a while, until a user comes around using a runtime (or a sub-scheduler) which happens to provide a
Waker
that can panic. And when it inevitably does, the system ends up in an interesting deadlock where new wakers won't be registered. This is obviously the users (my) fault for incorrect use ofAssertUnwindSafe
. But we can make the situation marginally better by avoiding the more egregious instances of where panic safety might trip you up.Solution
Trace all code paths which might execute out-of-tokio provided code that might panic (e.g.
into_waker
orwake
) and make sure that if a panic occurs theAtomicWaker
it's restored to a well-defined, functional state. This also means it can implementRefUnwindSafe
andUnwindSafe
which is nice.A viable alternative would be to poison the
AtomicWaker
if any panics are raised which might actually be "more correct", This would cause any future calls by any thread using the AtomicWaker to error or panic.