Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miri hangs in spin loops forever #1388

Closed
RalfJung opened this issue May 3, 2020 · 15 comments · Fixed by #2208
Closed

Miri hangs in spin loops forever #1388

RalfJung opened this issue May 3, 2020 · 15 comments · Fixed by #2208
Labels
A-concurrency Area: affects our concurrency (multi-thread) support C-bug Category: This is a bug.

Comments

@RalfJung
Copy link
Member

RalfJung commented May 3, 2020

The following program loops forever in Miri, but terminates as expected with normal rustc:

use std::thread;
use std::sync::atomic::{AtomicUsize, Ordering};

static FLAG: AtomicUsize = AtomicUsize::new(0);

fn main() {
    let j = thread::spawn(|| {
        while FLAG.load(Ordering::Acquire) == 0 {
            // spin and wait
        }
    });
    thread::yield_now();
    FLAG.store(1, Ordering::Release);
    j.join().unwrap();
}

One could argue the spin loop should use hint::spin_loop();, but that is unstable -- and it seems reasonable to expect the scheduler to de-schedule a thread at some point even if it does not yield.

Cc @vakaras

@RalfJung RalfJung added C-bug Category: This is a bug. A-concurrency Area: affects our concurrency (multi-thread) support labels May 3, 2020
@oli-obk
Copy link
Contributor

oli-obk commented May 3, 2020

We could do round robin, where every terminator causes the next thread to get activated.

@RalfJung
Copy link
Member Author

RalfJung commented May 3, 2020

Ah turns out there is a stable version of the same intrinsic, std::sync::atomic::spin_loop_hint().

We could do round robin, where every terminator causes the next thread to get activated.

I was more thinking "switch every N terminators", but N = 1 is certainly a possible choice. ;) We might make this configurable even.

@RalfJung
Copy link
Member Author

There's an interesting testcase in once_cell where even adding a spin loop hint does not help. My guess is that the scheduler keeps swapping between two of the spinning threads so that the others are starved.

@RalfJung
Copy link
Member Author

RalfJung commented Apr 23, 2021

I think here is another example where yield_now is insufficient:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f3ee4d476185291766d413ddeaa425ad
This test hangs in Miri; I assume what happens is that the two threads with their while !THREAD2_LAUNCHED.load(Ordering::SeqCst) loops are playing ping-pong and the thread that would set THREAD2_LAUNCHED never gets run.

I assume @JCTyblaidd's PR #1651 would help here.

@vakaras
Copy link
Contributor

vakaras commented Apr 23, 2021

I think here is another example where yield_now is insufficient:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f3ee4d476185291766d413ddeaa425ad
This test hangs in Miri; I assume what happens is that the two threads with their while !THREAD2_LAUNCHED.load(Ordering::SeqCst) loops are playing ping-pong and the thread that would set THREAD2_LAUNCHED never gets run.

This is the most likely scenario because the threads with the lower ID get priority (simply because we pick them by traversing the list).

@Kixunil
Copy link
Contributor

Kixunil commented Oct 8, 2021

Got bitten by this recently when I forgot to call thread::park() and attributed the spin loop to slowness/experimental nature of Miri. 😅 Tried to look into the docs and didn't find it. Found it when refactoring the code later.

I think it'd be useful to document this for now, I can send a PR if you agree.
Long-term, if some kind of hint is required, would it be feasible to have some heuristics where Miri could detect basic instances of this and warn?

Anyway, thanks for your great work on this tool! I feel much better now that I see my unsafe code is probably correct. :)

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2021

Long-term, if some kind of hint is required, would it be feasible to have some heuristics where Miri could detect basic instances of this and warn?

Seems better to use a different scheduling algorithm instead. One version of that started in #1651, but was not completed. A simpler alternative would be a round-robin scheduler.

Until that happens, I agree documenting this better would be good, and a PR would be definitely welcome. :)

@Kixunil
Copy link
Contributor

Kixunil commented Oct 9, 2021

Sorry for being unclear. It seems to me that a spinloop without spin_loop_hint() is undesirable so warning about that could be nice? Although that's probably something for clippy, but I'm not sure if clippy is powerful enough to notice that.

@RalfJung
Copy link
Member Author

It seems to me that a spinloop without spin_loop_hint() is undesirable so warning about that could be nice?

Hm, that might be the case (not sure if everyone would agree) but I don't think Miri is necessarily the tool for detecting such things. None of the solutions I know of that would make Miri better at running such code would provide any avenue towards having such a warning.

@Kixunil
Copy link
Contributor

Kixunil commented Oct 12, 2021

Sure, thanks!

bors added a commit that referenced this issue Oct 12, 2021
Document threading support a bit more

This adds a few known limitations around threading to the README and suggests the users to look into GitHub issues to learn more.

Addresses #1388 (comment)
bors bot added a commit to crossbeam-rs/crossbeam that referenced this issue May 19, 2022
831: Add spin loop hints in tests for Miri r=taiki-e a=cbeuw

This is a better way to do #829 

Miri does not have a pre-emptive scheduler, so once the execution falls into a spin loop it'll hang forever: rust-lang/miri#1388

Similar measures (`yield_now()`) are already present in [some other tests](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-queue/tests/array_queue.rs), but it's missing here

832: Fix links to macro rules in docs r=taiki-e a=alygin

The fix provides explicit links to macro rules in docs because they [are not resolved globally anymore](rust-lang/rust#96676).


Co-authored-by: Andy Wang <[email protected]>
Co-authored-by: Andrew Lygin <[email protected]>
@RalfJung
Copy link
Member Author

RalfJung commented Jun 5, 2022

There's an interesting testcase in once_cell where even adding a spin loop hint does not help. My guess is that the scheduler keeps swapping between two of the spinning threads so that the others are starved.

Here's a self-contained testcase for that situation:

use std::thread;
use std::sync::atomic::{AtomicUsize, Ordering};

static FLAG: AtomicUsize = AtomicUsize::new(0);

// When a thread yields, Miri's scheduler will pick the thread with the lowest ID
// that can run. IDs are assigned in thread creation order.
// This means we can make 2 threads ping-pong with each other while
// really there is a 3rd thread that we should schedule to make progress.

fn main() {
    let waiter1 = thread::spawn(|| {
        while FLAG.load(Ordering::Acquire) == 0 {
            // spin and wait
            thread::yield_now();
        }
    });
    let waiter2 = thread::spawn(|| {
        while FLAG.load(Ordering::Acquire) == 0 {
            // spin and wait
            thread::yield_now();
        }
    });
    let progress = thread::spawn(|| {
        FLAG.store(1, Ordering::Release);
    });
    // The first `join` blocks the main thread and thus takes it out of the equation.
    waiter1.join().unwrap();
    waiter2.join().unwrap();
    progress.join().unwrap();
}

@RalfJung
Copy link
Member Author

RalfJung commented Jun 5, 2022

#2197 helps with that. So after that PR, at least explicitly yielding spinloops should always work in Miri.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 5, 2022

I think here is another example where yield_now is insufficient: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f3ee4d476185291766d413ddeaa425ad This test hangs in Miri; I assume what happens is that the two threads with their while !THREAD2_LAUNCHED.load(Ordering::SeqCst) loops are playing ping-pong and the thread that would set THREAD2_LAUNCHED never gets run.

That test (originally by @jethrogb) also works now, but I had to change it a little. Unchanged, it double-panics:

thread '<unnamed>' panicked at 'use of std::thread::current() is not possible after the thread's local data has been destroyed', /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/thread/mod.rs:677:35
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:584:5
   1: std::rt::panic_fmt
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/panicking.rs:142:14
   2: std::rt::panic_display::<&str>
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/panicking.rs:72:5
   3: core::panicking::panic_str
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/panicking.rs:56:5
   4: std::option::expect_failed
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/option.rs:1874:5
   5: std::option::Option::<std::thread::Thread>::expect
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/option.rs:738:21
   6: std::thread::current
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/thread/mod.rs:677:5
   7: std::sync::mpsc::blocking::tokens
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/blocking.rs:30:42
   8: std::sync::mpsc::sync::wait::<()>
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/sync.rs:110:38
   9: std::sync::mpsc::sync::Packet::<()>::recv
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/sync.rs:280:25
  10: std::sync::mpsc::Receiver::<()>::recv
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/mod.rs:1178:47
  11: <launcher::{closure#0}::RecvOnDrop as std::ops::Drop>::drop
             at tests/pass/concurrency/spin_loops.rs:53:21
  12: std::ptr::drop_in_place::<launcher::{closure#0}::RecvOnDrop> - shim(Some(launcher::{closure#0}::RecvOnDrop))
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
  13: std::ptr::drop_in_place::<std::option::Option<launcher::{closure#0}::RecvOnDrop>> - shim(Some(std::option::Option<launcher::{closure#0}::RecvOnDrop>))
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
  14: std::mem::drop::<std::option::Option<launcher::{closure#0}::RecvOnDrop>>
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/mem/mod.rs:974:24
  15: std::thread::local::fast::destroy_value::<launcher::{closure#0}::RecvOnDrop>
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/thread/local.rs:1033:13
  16: std::sys_common::thread_local_dtor::register_dtor_fallback::run_dtors
             at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/thread_local_dtor.rs:43:17
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/sync.rs:386:42
stack backtrace:
   0:           0x2b728a - std::backtrace_rs::backtrace::miri::trace_unsynchronized::<&mut [closure@std::sys_common::backtrace::_print_fmt::{closure#1}]>
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/miri.rs:99:5
   1:           0x2b6e30 - std::backtrace_rs::backtrace::miri::trace::<&mut [closure@std::sys_common::backtrace::_print_fmt::{closure#1}]>
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/miri.rs:62:14
   2:           0x25a463 - std::backtrace_rs::backtrace::trace_unsynchronized::<[closure@std::sys_common::backtrace::_print_fmt::{closure#1}]>
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   3:           0x23c7eb - std::sys_common::backtrace::_print_fmt
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:66:5
   4:           0x23a1d3 - <std::sys_common::backtrace::_print::DisplayBacktrace as std::fmt::Display>::fmt
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:45:22
   5:           0x56a4dd - std::fmt::write
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1196:17
   6:           0x124dc3 - <std::sys::unix::stdio::Stderr as std::io::Write>::write_fmt
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/io/mod.rs:1654:15
   7:           0x23c4e3 - std::sys_common::backtrace::_print
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:48:5
   8:           0x23c389 - std::sys_common::backtrace::print
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:35:9
   9:           0x253dd1 - std::panicking::default_hook::{closure#1}
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:295:22
  10:           0x2540ae - std::panicking::default_hook
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:314:9
  11:           0x257c44 - std::panicking::rust_panic_with_hook
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:698:17
  12:           0x256a77 - std::panicking::begin_panic_handler::{closure#0}
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:588:13
  13:           0x23d08e - std::sys_common::backtrace::__rust_end_short_backtrace::<[closure@std::panicking::begin_panic_handler::{closure#0}], !>
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:138:18
  14:           0x256995 - std::panicking::begin_panic_handler
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:584:5
  15:           0x52467e - std::rt::panic_fmt
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/panicking.rs:142:14
  16:           0x53ff19 - std::result::unwrap_failed
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/result.rs:1805:5
  17:           0x53ab66 - std::result::Result::<std::sync::MutexGuard<std::sync::mpsc::sync::State<()>>, std::sync::PoisonError<std::sync::MutexGuard<std::sync::mpsc::sync::State<()>>>>::unwrap
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/result.rs:1098:23
  18:           0x215733 - std::sync::mpsc::sync::Packet::<()>::drop_port
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/sync.rs:386:25
  19:           0x200881 - <std::sync::mpsc::Receiver<()> as std::ops::Drop>::drop
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/mpsc/mod.rs:1507:36
  20:           0x3d48b8 - std::ptr::drop_in_place::<std::sync::mpsc::Receiver<()>> - shim(Some(std::sync::mpsc::Receiver<()>))
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
  21:            0x7ecf9 - <launcher::{closure#0}::RecvOnDrop as std::ops::Drop>::drop
                               at tests/pass/concurrency/spin_loops.rs:54:17
  22:           0x3d48c7 - std::ptr::drop_in_place::<launcher::{closure#0}::RecvOnDrop> - shim(Some(launcher::{closure#0}::RecvOnDrop))
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
  23:           0x3d48d7 - std::ptr::drop_in_place::<std::option::Option<launcher::{closure#0}::RecvOnDrop>> - shim(Some(std::option::Option<launcher::{closure#0}::RecvOnDrop>))
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
  24:           0x3bcee6 - std::mem::drop::<std::option::Option<launcher::{closure#0}::RecvOnDrop>>
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/mem/mod.rs:974:24
  25:            0xb05dd - std::thread::local::fast::destroy_value::<launcher::{closure#0}::RecvOnDrop>
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/thread/local.rs:1033:13
  26:           0x1ec71c - std::sys_common::thread_local_dtor::register_dtor_fallback::run_dtors
                               at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys_common/thread_local_dtor.rs:43:17
thread panicked while panicking. aborting.

I think this might be a true positive; that code does call rx.recv() in the destructor of a thread-local variable, which I am not surprised to see leading to problems. I assume it is pure chance that this works fine outside of Miri.

bors added a commit that referenced this issue Jun 5, 2022
make Miri's scheduler proper round-robin

When thread N blocks or yields, we activate thread N+1 next, rather than always activating thread 0. This should guarantee that as long as all threads regularly yield, each thread eventually takes a step again.

Fixes the "multiple loops that yield playing ping-pong" part of #1388.
`@cbeuw` I hope this doesn't screw up the scheduler-dependent tests you are adding in your PR.
@jethrogb
Copy link
Contributor

jethrogb commented Jun 7, 2022

@RalfJung The code with rx.recv had other issues, the final test that was upstreamed doesn't use MPSC channels, see rust-lang/rust#84409

@RalfJung
Copy link
Member Author

RalfJung commented Jun 7, 2022

Ah, that looks like a nice test, I'll add it to Miri. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: affects our concurrency (multi-thread) support C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants