Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible LLVM segfault #17495

Closed
chipkent opened this issue Jul 19, 2016 · 5 comments
Closed

Possible LLVM segfault #17495

chipkent opened this issue Jul 19, 2016 · 5 comments

Comments

@chipkent
Copy link

chipkent commented Jul 19, 2016

 ** On entry to DGEBAL parameter number  3 had an illegal value
 ** On entry to DGEHRD parameter number  2 had an illegal value
 ** On entry to DORGHR DORGQR parameter number  2 had an illegal value
 ** On entry to DHSEQR parameter number  4 had an illegal value
 ** On entry to DGEBAK parameter number  4 had an illegal value
signal (6): Aborted
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f87eedb22a4)
unknown function (ip: 0x7f87eedbe55e)
jl_gc_collect at /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia.so (unknown line)
jl_gc_alloc_2w at /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia.so (unknown line)
jl_alloc_svec_uninit at /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia.so (unknown line)
jl_alloc_svec at /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia.so (unknown line)
unknown function (ip: 0x7f87f0090a90)
unknown function (ip: 0x7f87f0091d77)
unknown function (ip: 0x7f87f00920f6)
unknown function (ip: 0x7f87f00928e1)
unknown function (ip: 0x7f87f00931ef)
unknown function (ip: 0x7f87f009736b)
jl_matching_methods at /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia.so (unknown line)
Julia has stopped: null, SIGABRT

Reproducer code can be found here:
https://gist.github.com/chipkent/bdf96e592b26297eb0b37684c4fb70d6

Along with the code, two serialized input files are needed. I know the serialized files are not portable between versions, but that was the easiest way to dump them.

covariance_bug_data.tar.gz

julia> versioninfo()
Julia Version 0.4.5
Commit 2ac304d (2016-03-18 00:58 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Running under gdb:

julia> include("scripts/debug/covariance_reproducer.jl")
INFO: Recompiling stale cache file /cc/home/chip/.julia/lib/v0.4/Compat.ji for module Compat.
INFO: Recompiling stale cache file /cc/home/chip/.julia/lib/v0.4/NamedArrays.ji for module NamedArrays.
 ** On entry to DGEBAL parameter number  3 had an illegal value
 ** On entry to DGEHRD parameter number  2 had an illegal value
 ** On entry to DORGHR DORGQR parameter number  2 had an illegal value
 ** On entry to DHSEQR parameter number  4 had an illegal value
 ** On entry to DGEBAK parameter number  4 had an illegal value
*** Error in `/cc/home/chip/software/julia-2ac304dfba/bin/julia-debug': free(): invalid next size (normal): 0x000000000274ac80 ***

Program received signal SIGABRT, Aborted.
0x00007ffff5a82c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
(gdb) bt
#0  0x00007ffff5a82c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff5a86028 in __GI_abort () at abort.c:89
#2  0x00007ffff5abf2a4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff5bcd6b0 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff5acb55e in malloc_printerr (ptr=<optimized out>, str=0x7ffff5bcd828 "free(): invalid next size (normal)", action=1) at malloc.c:4996
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5  0x00007ffff707b17c in (anonymous namespace)::ScheduleDAGRRList::~ScheduleDAGRRList() () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#6  0x00007ffff702b5f7 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#7  0x00007ffff702cf41 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#8  0x00007ffff702e4fc in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#9  0x00007ffff770266f in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#10 0x00007ffff770275e in llvm::FunctionPassManagerImpl::run(llvm::Function&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#11 0x00007ffff7702883 in llvm::FunctionPassManager::run(llvm::Function&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#12 0x00007ffff7196334 in llvm::JIT::jitTheFunction(llvm::Function*, llvm::MutexGuard const&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#13 0x00007ffff719691a in llvm::JIT::runJITOnFunctionUnlocked(llvm::Function*, llvm::MutexGuard const&) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#14 0x00007ffff7196a30 in llvm::JIT::getPointerToFunction(llvm::Function*) () from /cc/home/chip/software/julia-2ac304dfba/bin/../lib/julia/libjulia-debug.so
#15 0x00007ffff6dc5e75 in jl_generate_fptr (f=0x7ffdf6261bd0) at codegen.cpp:765
#16 0x00007ffff6db4db8 in jl_trampoline_compile_function (f=0x7ffdf6261bd0, always_infer=0, sig=0x7ffdf68b1890) at builtins.c:1019
#17 0x00007ffff6db4ef7 in jl_trampoline (F=0x7ffdf6261bd0, args=0x7fffffffd0d8, nargs=3) at builtins.c:1030
#18 0x00007ffff6da3751 in jl_apply (f=0x7ffdf6261bd0, args=0x7fffffffd0d8, nargs=3) at julia.h:1331
#19 0x00007ffff6da9a7e in jl_apply_generic (F=0x7ffdf21bcad0, args=0x7fffffffd0d8, nargs=3) at gf.c:1708
#20 0x00007ffff7ea8120 in ?? ()
#21 0x00000000026040a0 in ?? ()
#22 0x0000000000000012 in ?? ()
#23 0x00007fffffffd320 in ?? ()
#24 0x00007ffdf60f9d50 in ?? ()
#25 0x00007ffdf6ca6870 in ?? ()
#26 0x0000000000000000 in ?? ()
@Keno
Copy link
Member

Keno commented Jul 19, 2016

The memory corruption is probably unrelated to LLVM and it just happens to get detected there. Probably the simplest thing to do is to try valgrind on the thing and see what happens.

@tkelman
Copy link
Contributor

tkelman commented Jul 19, 2016

The "parameter had an illegal value" also makes me suspect openblas, which wasn't using the latest version in 0.4.5 (but is in 0.4.6, 96614fd). Is this a source build (if so, any customizations?) or a binary?

@chipkent
Copy link
Author

Linux binary right off the website.

@tkelman
Copy link
Contributor

tkelman commented Jul 19, 2016

Can you try with 0.4.6? This could easily be processor specific

@chipkent
Copy link
Author

I ran this with 0.4.6 and got a completely different error (below). I confirmed that I caused this error. Possibly a fix was introduced between 0.4.5 and 0.4.6. From my perspective, this item is fixed, but I don't want to mark it closed until someone on the Julia side confirms the problem should be closed.

ERROR: LoadError: ArgumentError: matrix contains NaNs
in checkfinite at linalg/lapack.jl:58
in geevx! at linalg/lapack.jl:1834
in eigfact! at linalg/eigen.jl:32
in eigfact at linalg/eigen.jl:57
in eig at linalg/eigen.jl:66
in nearPSD at /cc/home/chip/source/github/Cecropia/scripts/debug/covariance_reproducer.jl:17
in make_positive_definite at /cc/home/chip/source/github/Cecropia/scripts/debug/covariance_reproducer.jl:8
in compute_covariance at /cc/home/chip/source/github/Cecropia/scripts/debug/covariance_reproducer.jl:5
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:320
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading /cc/home/chip/source/github/Cecropia/scripts/debug/covariance_reproducer.jl, in expression starting on line 30

@tkelman tkelman closed this as completed Jul 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants