Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug, hangs, in 1.6 #36075

Closed
PallHaraldsson opened this issue May 29, 2020 · 13 comments
Closed

Bug, hangs, in 1.6 #36075

PallHaraldsson opened this issue May 29, 2020 · 13 comments

Comments

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented May 29, 2020

Works in latest 1.5 beta1, but, not in the nightly I just downloaded:

julia> using GR
[ Info: Precompiling GR [28b8d3ca-fb5f-59d9-8090-bfdbd6d07a71]

julia> @time (x = LinRange(0, 10, 500); y = sin.(x.^2) .* exp.(-x); plot(x, y))

Also hangs on 1.6 8f512f3 (19 days old master); and other optimization levels (but worked in older 1.5 "ea669c3d3e (33 days old master)").

@JeffBezanson
Copy link
Member

I checked out that commit and I cannot reproduce this. Could you provide the output of versioninfo()? If you have a manifest that might help too.

@sgaure
Copy link

sgaure commented May 29, 2020

It's the same hang here:

(@v1.6) pkg> activate .
 Activating environment at `~/test/Project.toml`
(test) pkg> st
Status `~/test/Project.toml`
  [28b8d3ca] GR v0.50.0
julia> using GR
julia> versioninfo()
Julia Version 1.6.0-DEV.129
Commit 38a373af12* (2020-05-29 17:04 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD Ryzen 7 2700X Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)
Environment:
  JULIA_EXCLUSIVE = 1
  JULIA_NUM_THREADS = 16

julia> @time (x = LinRange(0, 10, 500); y = sin.(x.^2) .* exp.(-x); plot(x, y))
(indefinite hang without any cpu usage)

Attaching to the julia process with gdb yields the following:

Attaching to process 552730
[New LWP 552731]
[New LWP 552732]
[New LWP 552733]
[New LWP 552734]
[New LWP 552735]
[New LWP 552736]
[New LWP 552737]
[New LWP 552738]
[New LWP 552739]
[New LWP 552740]
[New LWP 552741]
[New LWP 552742]
[New LWP 552743]
[New LWP 552744]
[New LWP 552745]
[New LWP 552746]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=4, buf=0x7ffc643f7e1c, fd=19)
    at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
(gdb) where
#0  __libc_recv (flags=<optimized out>, len=4, buf=0x7ffc643f7e1c, fd=19)
    at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=19, buf=0x7ffc643f7e1c, len=4, flags=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00007f71f3ab4b68 in gks_drv_socket ()
   from /home/sgaure/.julia/packages/GR/QXeZO/src/../deps/gr/lib/libGR.so
#3  0x00007f71f3a8bb6d in gks_ddlk.isra ()
   from /home/sgaure/.julia/packages/GR/QXeZO/src/../deps/gr/lib/libGR.so
#4  0x00007f71f3a8c83d in gks_open_ws ()
   from /home/sgaure/.julia/packages/GR/QXeZO/src/../deps/gr/lib/libGR.so
#5  0x00007f71f3a19399 in initgks ()
   from /home/sgaure/.julia/packages/GR/QXeZO/src/../deps/gr/lib/libGR.so
#6  0x00007f71f3a1dfb5 in gr_clearws ()
   from /home/sgaure/.julia/packages/GR/QXeZO/src/../deps/gr/lib/libGR.so
#7  0x00007f722422a640 in ?? ()
#8  0x00007f7200000000 in ?? ()
#9  0x0000000000000000 in ?? ()
(gdb) 

Fileno 19 is (output from lsof on the process):

julia 552730 sgaure 19u IPv4 7825139 0t0 TCP localhost:57114->localhost:8410 (ESTABLISHED)

The other threads are as follows:

Thread 2:


(gdb) thr 2
[Switching to thread 2 (Thread 0x7f725d3e1700 (LWP 552731))]
#0  0x00007f7262dd5322 in __GI___sigtimedwait (set=set@entry=0x7f725d3d9210, info=info@entry=0x7f725d3d9290, 
    timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:29
29      ../sysdeps/unix/sysv/linux/sigtimedwait.c: No such file or directory.
(gdb) where
#0  0x00007f7262dd5322 in __GI___sigtimedwait (set=set@entry=0x7f725d3d9210, info=info@entry=0x7f725d3d9290, 
    timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:29
#1  0x00007f7262dd538b in __GI___sigwaitinfo (set=set@entry=0x7f725d3d9210, info=info@entry=0x7f725d3d9290)
    at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:25
#2  0x00007f72630f8270 in signal_listener (arg=<optimized out>) at /src/julia/src/signals-unix.c:606
#3  0x00007f7262f89609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007f7262eb0103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) thr 3
[Switching to thread 3 (Thread 0x7f724e38e700 (LWP 552732))]
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f724e38c358)
    at ../sysdeps/nptl/futex-internal.h:183
183     ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) where
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f724e38c358)
    at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f724e38c308, cond=0x7f724e38c330)
    at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x7f724e38c330, mutex=mutex@entry=0x7f724e38c308) at pthread_cond_wait.c:638
#3  0x00007f72631cd39d in uv_cond_wait (cond=cond@entry=0x7f724e38c330, mutex=mutex@entry=0x7f724e38c308)
    at /src/julia/deps/srccache/libuv-4726f5ef94fe5e11335a6d6ef048b00a72110135/src/unix/thread.c:827
#4  0x00007f72630e574b in jl_task_get_next (trypoptask=0x7f7255146ae0 <jl_system_image_data+20801952>, 
    q=0x7f724f678010) at /src/julia/src/partr.c:510
#5  0x00007f7253aeb756 in julia_poptask_47495 () at task.jl:706
#6  0x00007f7253803ec0 in wait () at task.jl:714
#7  japi1_task_done_hook_51999 () at task.jl:444
#8  0x00007f72630bbe14 in jl_apply (nargs=2, args=0x7f724e386280) at /src/julia/src/julia.h:1700
#9  jl_finish_task (t=0x7f724f624010, resultval=<optimized out>) at /src/julia/src/task.c:198
#10 0x00007f72630e50e8 in jl_threadfun (arg=0x5637c5dd0870) at /src/julia/src/partr.c:265
#11 0x00007f7262f89609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f7262eb0103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The rest of the threads also wait for a futex in libuv.

@sgaure
Copy link

sgaure commented May 29, 2020

And, by the way, the other end of tcp connection is connected to a process running gksqt. I can't find its symbols, so gdb is less helpful.

~/.julia/packages/GR/yMV3y/src/../deps/gr/bin/gksqt -session 10cc617572000158585546300000016770170_1589098795_763985

Attaching to process 308020
[New LWP 308023]
[New LWP 308026]
warning: Could not load vsyscall page because no executable was specified
0x00007fd0c15fb96f in ?? ()
(gdb) where
#0  0x00007fd0c15fb96f in ?? ()
#1  0x00000000010aab60 in ?? ()
#2  0x00000000010aab60 in ?? ()
#3  0x0000000000000004 in ?? ()
#4  0xffffffff00000001 in ?? ()
#5  0x00007fd0b8005010 in ?? ()
#6  0x00007fd0c0ca61ae in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) thr 2
[Switching to thread 2 (LWP 308023)]
#0  0x00007fd0c15fb96f in ?? ()
(gdb) where
#0  0x00007fd0c15fb96f in ?? ()
#1  0x0000000000000000 in ?? ()

@Keno
Copy link
Member

Keno commented Jun 6, 2020

Getting an rr trace could help in debugging this (julia --bug-report=rr - see https://julialang.org/blog/2020/05/rr/).

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 6, 2020

I tried --bug-report=rr, but got "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data"/"Internal Server Error". Otherwise, simple to do so far (without getting a change to run the problematic line[s]), while I did get, and may still have uploaded?:

rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 3.
Change it to 1, or use 'rr record -n' (slow).
Consider putting 'kernel.perf_event_paranoid = 1' in /etc/sysctl.conf
[..]
[ Info: Uploading Trace directory
Uploaded to https://s3.amazonaws.com/julialang-dumps/reports/2020-06-06T18-17-47-PallHaraldsson.tar.zst

@Keno
Copy link
Member

Keno commented Jun 6, 2020

Ah, you'll want to set that sysctl first:

echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

@tkf proposed a better error message (JuliaLang/BugReporting.jl#21), which should help in the future

@tkf
Copy link
Member

tkf commented Jun 6, 2020

Is 0 better? I use 1 (as the error message says so)

@Keno
Copy link
Member

Keno commented Jun 6, 2020

1 is fine

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 7, 2020

Should I manually (archive 327M directory, and) upload (and where?) after using GR (probably useless, but if I exit then it seems it could automatically), or while hanging on the plot command, as otherwise directory gets deleted:

$ echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid

$ ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no --bug-report=rr
[ Info: Loading BugReporting package...
[ Info: Package `BugReporting` not found - attempting temporary installation
[..]
rr: Saving execution to trace directory `/home/pharaldsson_sym/.julia/artifacts/jl_d5ZTE6/julia-0'.

[..]

julia> using GR

julia> @time (x = LinRange(0, 10, 500); y = sin.(x.^2) .* exp.(-x); plot(x, y))
^CERROR: InterruptException:
Stacktrace:
 [1] poptask(::Base.InvasiveLinkedListSynchronized{Task}) at ./task.jl:706
 [2] wait at ./task.jl:714 [inlined]
 [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:106
 [4] wait(::Base.Process) at ./process.jl:622
 [5] success at ./process.jl:483 [inlined]
 [6] run(::Cmd; wait::Bool) at ./process.jl:440
 [7] run at ./process.jl:438 [inlined]
 [8] (::BugReporting.var"#4#5"{String,Tuple{Cmd,Array{Any,1}},Array{SubString{String},1}})(::String) at /home/pharaldsson_sym/.julia/packages/BugReporting/DYZaI/src/BugReporting.jl:84
 [9] (::rr_jll.var"#8#9"{BugReporting.var"#4#5"{String,Tuple{Cmd,Array{Any,1}},Array{SubString{String},1}}})() at /home/pharaldsson_sym/.julia/packages/rr_jll/B8mjJ/src/wrappers/x86_64-linux-gnu-cxx11.jl:36
 [10] withenv(::rr_jll.var"#8#9"{BugReporting.var"#4#5"{String,Tuple{Cmd,Array{Any,1}},Array{SubString{String},1}}}, ::Pair{String,String}, ::Vararg{Pair{String,String},N} where N) at ./env.jl:161
 [11] rr(::BugReporting.var"#4#5"{String,Tuple{Cmd,Array{Any,1}},Array{SubString{String},1}}; adjust_PATH::Bool, adjust_LIBPATH::Bool) at /home/pharaldsson_sym/.julia/packages/rr_jll/B8mjJ/src/wrappers/x86_64-linux-gnu-cxx11.jl:35
 [12] rr(::Function) at /home/pharaldsson_sym/.julia/packages/rr_jll/B8mjJ/src/wrappers/x86_64-linux-gnu-cxx11.jl:19
 [13] rr_record(::Cmd, ::Vararg{Any,N} where N; trace_dir::String) at /home/pharaldsson_sym/.julia/packages/BugReporting/DYZaI/src/BugReporting.jl:74
 [14] (::BugReporting.var"#15#16"{Array{Any,1},Cmd})(::String) at /home/pharaldsson_sym/.julia/packages/BugReporting/DYZaI/src/BugReporting.jl:130
 [15] create_artifact(::BugReporting.var"#15#16"{Array{Any,1},Cmd}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Artifacts.jl:214
 [16] make_interactive_report(::String, ::Array{Any,1}) at /home/pharaldsson_sym/.julia/packages/BugReporting/DYZaI/src/BugReporting.jl:129
 [17] make_interactive_report(::String) at /home/pharaldsson_sym/.julia/packages/BugReporting/DYZaI/src/BugReporting.jl:123
 [18] #invokelatest#1 at ./essentials.jl:710 [inlined]
 [19] invokelatest at ./essentials.jl:709 [inlined]
 [20] report_bug(::String) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/InteractiveUtils/src/InteractiveUtils.jl:379
 [21] exec_options(::Base.JLOptions) at ./client.jl:232
 [22] _start() at ./client.jl:485
$

At least I could exit, without rr, I couldn't with CTRL-C.

@Keno
Copy link
Member

Keno commented Jun 7, 2020

It would be best to killall -15 rr from a separate terminal once it's hung to tell rr to stop and clean up the recording.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jul 9, 2020

FYI: JuliaPlots/Plots.jl#2819 (comment)

It appears that downgrading Mesa 19.3 to Mesa 19.2.8 (=default on Ubuntu 18.04) fixed this issue.

It might likely be related (or not), while I didn't do any upgrade (or downgrade), knowingly, except to later Julia (and I have 18.04).

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Aug 7, 2020

A. The good news is this just works in latest Julia 1.6 nightly, just not in Julia 1.5 (but note I installed musl variant of it).

If someone (e.g. @jheinen or me later) can confirm Julia 1.5 regular version works, then this issue can be closed. Unless you want it open for musl.

(@v1.5) pkg> st GR  # didn't work on this old version, nor on master:
Status `~/.julia/environments/v1.5/Project.toml`
  [28b8d3ca] GR v0.48.0

(@v1.5) pkg> st GR
Status `~/.julia/environments/v1.5/Project.toml`
  [28b8d3ca] GR v0.51.0 `https://github.com/jheinen/GR.jl.git#master`

julia> @time (x = LinRange(0, 10, 500); y = sin.(x.^2) .* exp.(-x); plot(x, y))

signal (11): Segmentation fault
in expression starting at REPL[3]:1
unknown function (ip: 0x2eb36)
Allocations: 8357858 (Pool: 8355686; Big: 2172); GC: 10
Segmentation fault (core dumped)

B. I would upload a bug-report like this, except I can't (again, probably because of musl, open a new issue on that? This works in 1.6, and the GR example completes so I guess useless to upload it):

$ ~/julia-1.5.0/bin/julia --bug-report=rr
[ Info: Loading BugReporting package...
ERROR: LoadError: InitError: could not load library "/home/pharaldsson_sym/.julia/artifacts/374c8b34b6f9a9f921258b1ef93d1b8856f00702/lib/libxml2.so"
Error loading shared library libiconv.so.2: No such file or directory (needed by /home/pharaldsson_sym/.julia/artifacts/374c8b34b6f9a9f921258b1ef93d1b8856f00702/lib/libxml2.so)
Stacktrace:
 [1] #dlopen#3 at /buildworker/worker/package_musl64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /buildworker/worker/package_musl64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)
 [3] __init__() at /home/pharaldsson_sym/.julia/packages/XML2_jll/QkZTc/src/wrappers/x86_64-linux-musl.jl:105
 [4] _include_from_serialized at ./loading.jl:697
 [5] _require_search_from_serialized at ./loading.jl:782
 [6] _require at ./loading.jl:1007
 [7] require at ./loading.jl:928
 [8] require at ./loading.jl:923
 [9] include at ./Base.jl:380
 [10] include at ./Base.jl:368
 [11] top-level scope at none:2
 [12] eval at ./boot.jl:331 [inlined]
 [13] eval(::Expr) at ./client.jl:467
 [14] top-level scope at ./none:3
during initialization of module XML2_jll
in expression starting at /home/pharaldsson_sym/.julia/packages/EzXML/ZNwhK/src/EzXML.jl:106
ERROR: LoadError: Failed to precompile EzXML [8f5d6c58-4d21-5cfd-889c-e3ad7ee6a615] to /home/pharaldsson_sym/.julia/compiled/v1.5/EzXML/Hh8vo_glT1O.ji.
Stacktrace:
 [1] error at ./error.jl:33
 [2] compilecache at ./loading.jl:1290

[..]

@PallHaraldsson
Copy link
Contributor Author

FYI to people who seemingly reproduced, e.g. @sgaure, and those that didn't. In my case at least this hang is likely about defunct Julia processes. See:

jheinen/GR.jl#340 (comment)

I'm not sure Julia is the root cause (not can rule out), any version. I wanted to notify people here (while I remember this), in case anyone scratching their head. This may have happened by accident in later versions of Julia, while possibly could have happened to any (then a problem with GR, only, possibly)? "defunct" is an annoying problem in Linux, and I only know how to "solve" with a restart, so will have to wait. Can anyone without such problems confirm GR works in Julia 1.5.1 and 1.6?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants