Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process hangs (macOS, M1 Max, julia 1.7.1 mac64, 1.7.3 mac64, and julia 1.8 mac64) #46459

Closed
charnik opened this issue Aug 23, 2022 · 5 comments
Labels
multithreading Base.Threads and related functionality system:mac Affects only macOS

Comments

@charnik
Copy link

charnik commented Aug 23, 2022

I'm experiencing process hangs when running a simple scenario that involves thread synchronization using conditional variables. The scenario establishes a cycle of threads each one waiting on the conditional variable of the next thread in the cycle. When a thread wakes up as a result of a change in the next thread's counter, it increments this counter, set its own counter to this value, and sends a notification about this change to the waiting thread. The process is kick-started by the main thread setting the counter of the first thread in the cycle. Every now and then the first thread in the cycle will report its counter's value and this is expected to continue to infinity. However, after some time in the process, reporting freezes.

Similar hangs have been observed in other multi-threaded settings, but this test case is particularly attractive due to its simplicity. To reproduce, decompress and run the script cond.jl.gz, authored by Todd Veldhuizen, and wait until no progress is observed in the terminal's output. This has been seen to be reproducible within 20-30 minutes' time.

julia-1.8-mac64 -t auto -e 'include("cond.jl"); test()'

Alternatively, and even more preferably, run it within lldb, and wait until lldb stops the process due to an invalid memory access. This also gives an indication of what is causing these hangs (race condition?, alignment issue?).

$> lldb julia-1.8-mac64
(lldb) process launch -- -t auto
julia> include("cond.jl")
worker_task (generic function with 1 method)
julia> test()
2022-08-23T15:53:13.671 [0xb42a20edc19bcfad] count=57581 rate=14318.435658369568/s
2022-08-23T15:53:13.976 [0xb42a20edc19bcfad] count=60811 rate=14054.83873645445/s
2022-08-23T15:53:15.904 [0xb42a20edc19bcfad] count=89631 rate=14330.160884694018/s
2022-08-23T15:53:16.972 [0xb42a20edc19bcfad] count=106561 rate=14553.998816791556/s
2022-08-23T15:53:18.478 [0xb42a20edc19bcfad] count=128741 rate=14583.216742187276/s
2022-08-23T15:53:23.970 [0xb42a20edc19bcfad] count=218631 rate=15267.680522237808/s
2022-08-23T15:53:24.735 [0xb42a20edc19bcfad] count=228231 rate=15129.332338103202/s
...
2022-08-23T16:04:21.049 [0xb42a20edc19bcfad] count=10791411 rate=16073.016765926684/s
2022-08-23T16:04:21.720 [0xb42a20edc19bcfad] count=10803881 rate=16075.511492185267/s
2022-08-23T16:04:25.879 [0xb42a20edc19bcfad] count=10871891 rate=16077.231667431759/s
2022-08-23T16:04:28.823 [0xb42a20edc19bcfad] count=10918121 rate=16075.615360604044/s
2022-08-23T16:04:30.320 [0xb42a20edc19bcfad] count=10942251 rate=16075.699823382367/s
2022-08-23T16:04:34.799 [0xb42a20edc19bcfad] count=11016861 rate=16079.501437760955/s
2022-08-23T16:04:37.365 [0xb42a20edc19bcfad] count=11061071 rate=16083.799767570577/s
2022-08-23T16:04:38.401 [0xb42a20edc19bcfad] count=11077851 rate=16083.95851491725/s
Process 11669 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x109004000)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #4, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #5, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #6, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #7, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #8, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #9, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #10, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
  thread #11, stop reason = EXC_BAD_ACCESS (code=2, address=0x109005008)
    frame #0: 0x00000001095d2e81 libjulia-internal.1.8.dylib`ijl_gc_safepoint + 17
libjulia-internal.1.8.dylib`ijl_gc_safepoint:
->  0x1095d2e81 <+17>: movq   (%rax), %rax
    0x1095d2e84 <+20>: popq   %rbp
    0x1095d2e85 <+21>: retq
    0x1095d2e86 <+22>: nopw   %cs:(%rax,%rax)
Target 0: (julia-1.8-mac64) stopped.

Version info:

julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
  Threads: 10 on 10 virtual cores

Hints:

  • It is reproducible with the official mac64 builds of Julia 1.71., 1.7.3, and 1.8.
  • It is not reproducible on Linux with the official amd64 build of Julia 1.7.1.
  • It is not reproducible on macOS M1 Max using the macaarch64 official build of Julia 1.8.
@giordano giordano added system:mac Affects only macOS multithreading Base.Threads and related functionality labels Aug 23, 2022
@gbaraldi
Copy link
Member

This looks like #41820 and related ones, rosetta with multiple cores seems to do something bad when you go to the economy cores.

@vtjnash
Copy link
Member

vtjnash commented Sep 1, 2022

you seem to be missing a couple threads (2 and 3)? and you also need to configure lldb to ignore signals, since those are managed by the Julia runtime

@vtjnash
Copy link
Member

vtjnash commented Sep 1, 2022

@ViralBShah
Copy link
Member

Close?

@charnik
Copy link
Author

charnik commented Jun 9, 2023

Sorry I forgot to follow up here. I checked again today using the official julia-1.9.1-macaarch64 release, and I couldn't reproduce the hanging behavior. Feel free to close this. It'd be great if anyone had any insights into what could have changed in Julia to shield it from this hang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality system:mac Affects only macOS
Projects
None yet
Development

No branches or pull requests

5 participants