Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

svd/svdvals slow on Mac #501

Closed
simonbyrne opened this issue Feb 22, 2018 · 6 comments
Closed

svd/svdvals slow on Mac #501

simonbyrne opened this issue Feb 22, 2018 · 6 comments
Labels
performance Must go faster system:mac Affects only macOS upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@simonbyrne
Copy link
Contributor

simonbyrne commented Feb 22, 2018

Not sure what is going on here, but

B = randn(100,100)
@time svdvals(B)

takes about 20ms on a Mac, vs about 1.5ms inside a Linux VM on the same machine.

Mac versioninfo:

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

(from https://discourse.julialang.org/t/svdvals-is-alarmingly-slow/9259)

@davidssmith
Copy link

I've found that the MacOS system Python is faster at matrix ops than even Anaconda with MKL. I'm guessing it is using the Accelerate framework. Would it be worth trying to link Julia to that library and testing?

@andreasnoack
Copy link
Member

So all of the slowdown happens during the reduction to bidiagonal (LAPACK's dgebrd). More specifically it looks like it happens in daxpy. On my Mac I get

...
187 .../julia/libopenblas64_.dylib:?; dlarf_64_
 5   .../julia/libopenblas64_.dylib:?; dgemv_64_
  1 .../julia/libopenblas64_.dylib:?; dgemv_n_HASWELL
   1 .../julia/libopenblas64_.dylib:?; dgemv_kernel_4x4
  1 .../julia/libopenblas64_.dylib:?; dgemv_t_HASWELL
   1 .../julia/libopenblas64_.dylib:?; dgemv_kernel_4x4
  1 .../julia/libopenblas64_.dylib:?; dgemv_thread_n
   1 .../julia/libopenblas64_.dylib:?; exec_blas
  2 .../julia/libopenblas64_.dylib:?; dscal_k_HASWELL
   2 .../julia/libopenblas64_.dylib:?; dscal_kernel_8_zero
 181 .../julia/libopenblas64_.dylib:?; dger_64_
  171 ...julia/libopenblas64_.dylib:?; dger_k_HASWELL
   170 ...julia/libopenblas64_.dylib:?; daxpy_k_HASWELL
    169 ...ulia/libopenblas64_.dylib:?; daxpy_kernel_8
  10  ...julia/libopenblas64_.dylib:?; dger_thread
   10 ...julia/libopenblas64_.dylib:?; exec_blas
    9 ...julia/libopenblas64_.dylib:?; ger_kernel
     9 ...julia/libopenblas64_.dylib:?; daxpy_k_HASWELL
      9 ...ulia/libopenblas64_.dylib:?; daxpy_kernel_8

but a similar computation on Linux gives

...
24 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dlarf_64_
 16 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dgemv_64_
  5 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dgemv_n_HASWELL
   5 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?;
  6 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dgemv_t_HASWELL
   5 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?;
  2 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dgemv_thread_n
   2 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; exec_blas
    2 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?;
     2 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dgemv_n_HASWELL
      2 ...ulia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?;
  1 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dgemv_thread_t
   1 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; exec_blas
    1 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; exec_blas_async_wait
  1 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dscal_k_HASWELL
 8  .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dger_64_
  5 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dger_k_HASWELL
   5 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; daxpy_k_HASWELL
    2 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?;
  3 .../julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; dger_thread
   3 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; exec_blas
    2 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?;
     2 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; daxpy_k_HASWELL
    1 ...julia-d386e40c17/bin/../lib/julia/libopenblas64_.so:?; exec_blas_async_wait
     1 ...libc-2.26/posix/../sysdeps/unix/syscall-template.S:84;

@ararslan ararslan added performance Must go faster system:mac Affects only macOS linear algebra labels Feb 22, 2018
@simonbyrne
Copy link
Contributor Author

Yup, that's it.

import Base.LinAlg.BLAS: libblas, BlasInt

function axpy!(a::Float64, x::Vector{Float64}, y::Vector{Float64})
    n = length(x)
    if n != length(y)
        throw(DimensionMismatch("x and y don't match"))
    end
    ccall((:daxpy_64_, libblas), Void,
          (Ref{BlasInt}, Ref{Float64}, Ref{Float64}, Ref{BlasInt},
           Ref{Float64}, Ref{BlasInt}),
          n, a, x, 1, y, 1)
    y
end

X = ones(1_000)
Y = ones(1_000)

using BenchmarkTools
@btime axpy!(1.2,$X,$Y);

Gives:

  • macOS: 1.418 μs (6 allocations: 128 bytes)
  • Linux VM: 166.000 ns (6 allocations: 128 bytes)

@ViralBShah
Copy link
Member

OpenBLAS issue upstream?

@simonbyrne
Copy link
Contributor Author

done: OpenMathLib/OpenBLAS#1470

@simonbyrne simonbyrne added the upstream The issue is with an upstream dependency, e.g. LLVM label Feb 22, 2018
@ViralBShah
Copy link
Member

Verified:


julia> versioninfo()
Julia Version 0.7.0-DEV.5258
Commit f653b14740 (2018-05-30 00:40 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.5.0)
  CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

julia> B = randn(100,100);

julia> @time svdvals(B);
  0.001168 seconds (13 allocations: 138.328 KiB)

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster system:mac Affects only macOS upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

5 participants