Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Scaling loop instead of broadcasting in strided matrix exp (#56463)
Firstly, this is easier to read. Secondly, this merges the two loops into one. Thirdly, this avoids the broadcasting latency. ```julia julia> using LinearAlgebra julia> A = rand(2,2); julia> @time LinearAlgebra.exp!(A); 0.952597 seconds (2.35 M allocations: 116.574 MiB, 2.67% gc time, 99.01% compilation time) # master 0.877404 seconds (2.17 M allocations: 106.293 MiB, 2.65% gc time, 99.99% compilation time) # this PR ``` The performance also improves as there are fewer allocations in the first branch (`opnorm(A, 1) <= 2.1`): ```julia julia> B = diagm(0=>im.*(float.(1:200))./200, 1=>(1:199)./400, -1=>(1:199)./400); julia> opnorm(B,1) 1.9875 julia> @Btime exp($B); 5.066 ms (30 allocations: 4.89 MiB) # nightly v"1.12.0-DEV.1581" 4.926 ms (27 allocations: 4.28 MiB) # this PR ```
- Loading branch information