Alternative interface to expv! #6

jagot · 2018-10-02T10:10:28Z

As discussed in #1, I've implemented an alternative interface to exvp! for the Hermitian case, where the Krylov subspace is successively built up, but the iteration is terminated, if a stopping criterion based on an exponentiation error estimate is fulfilled. For time-stepping, where the error of the exponentiation, and not the condition number of the Krylov subspace, is important, I have found this to be a more reliable indicator of when to terminate the iteration.

This PR follows Diagonalize before multiplying by time-step in the case of Hermitian operators #5, but does not really depend on it, so before this is merged, it should be rebased on whatever state master is in at that time.
The Krylov iterations previously used a separate cache vector, which I skipped. Instead I use the "next" vector in the subspace, i.e. V[:,j+1], where cache after being orthogonalized and scaled would end up anyway. I have not, however, removed it from the call signature, since I still need to clean everywhere in krylov_phiv_adaptive.jl, which was not as fun to do.
I've seen different broadcast mechanisms being used, but I am not sure about the subtler implications, i.e. which one is best where. Please check and tell me!
- @. V[:, 1] = 1 / beta
- lmul!(1/beta, @view(V[:, 1])
- For loop:
```
 @inbounds for i = 1:size(V,1)
     V[i,1] = 1/beta
 end
```
OK to keep verbose flag? I think it is nice to have when debugging time-stepping algorithms.
As can be seen in the benchmarks below, there are still some allocations going on, which I'd like to get rid of, or at least minimize. Pointers would be great!
It is not fantastically nice to have a LAPACK wrapper directly in the code, but I am not sure what else can be done at the moment, without a giant overhaul of LinearAlgebra.
Something similar should be implemented for Arnoldi.
Are there better unit tests that can be done?

As an example of usage and its performance, see the following code (which also doubles as the test):

n = 3000
m = 30
A = Hermitian(rand(n,n))
Ks = KrylovSubspace{ComplexF64,Float64}(n,m)
sc = StegrCache(ComplexF64, n)

b = rand(ComplexF64, n)
w = similar(b)
w′ = similar(b)

dt = 0.1

atol=1e-10
rtol=1e-10
expv!(w, -im, dt*A, b, Ks, sc, atol=atol, rtol=rtol, verbose=true)

println("Benchmarking Lanczos expv!")
@btime expv!(w, -im, dt*A, b, Ks, sc, atol=atol, rtol=rtol, verbose=false)

function fullexp!(w, A, v)
    eA = exp(A)
    mul!(w, eA, v)
    w
end
println("Benchmarking full exponentiation")
@btime fullexp!(w′, -im*dt*A, b)

norm(w-w′)

n=300

Initial norm: β₀ 1.380667e+01, stopping threshold: 1.480667e-09
iter 1, α[1] 1.087384e+01, β[1] 6.692534e+00, σ 9.240161e+01
iter 2, α[2] 4.099573e+00, β[2] 5.765774e-01, σ 6.664466e+00
iter 3, α[3] 1.026875e-02, β[3] 4.875859e-01, σ 1.600913e+00
iter 4, α[4] -3.137509e-04, β[4] 5.099652e-01, σ 4.215033e-01
iter 5, α[5] -2.819237e-02, β[5] 5.078813e-01, σ 7.169712e-02
iter 6, α[6] 1.549987e-03, β[6] 4.963334e-01, σ 8.827376e-03
iter 7, α[7] -2.843072e-03, β[7] 5.057275e-01, σ 8.815623e-04
iter 8, α[8] 3.516485e-02, β[8] 5.074857e-01, σ 7.329762e-05
iter 9, α[9] -1.400483e-02, β[9] 5.057665e-01, σ 5.180701e-06
iter 10, α[10] -1.756147e-02, β[10] 4.907045e-01, σ 3.096524e-07
iter 11, α[11] 2.551781e-02, β[11] 4.859925e-01, σ 1.624763e-08
iter 12, α[12] 1.383772e-02, β[12] 6.395702e-01, σ 1.007198e-09
Krylov subspace size: 12
Benchmarking Lanczos expv!
  1.089 ms (1359 allocations: 765.83 KiB)
Benchmarking full exponentiation
  29.685 ms (1320 allocations: 44.01 MiB)
7.58943925351473e-11

n=3000

Initial norm: β₀ 4.522136e+01, stopping threshold: 4.622136e-09
iter 1, α[1] 1.138220e+02, β[1] 6.424715e+01, σ 2.905344e+03
iter 2, α[2] 3.624997e+01, β[2] 1.814199e+00, σ 2.416432e+01
iter 3, α[3] -1.562011e-02, β[3] 1.595978e+00, σ 3.571388e+01
iter 4, α[4] -4.007526e-03, β[4] 1.594921e+00, σ 2.880494e+01
iter 5, α[5] 7.710502e-03, β[5] 1.589851e+00, σ 1.602004e+01
iter 6, α[6] 1.563939e-02, β[6] 1.567023e+00, σ 6.572510e+00
iter 7, α[7] 6.127848e-03, β[7] 1.601089e+00, σ 2.190977e+00
iter 8, α[8] 1.206389e-02, β[8] 1.598282e+00, σ 6.015621e-01
iter 9, α[9] 7.070683e-02, β[9] 1.622612e+00, σ 1.430245e-01
iter 10, α[10] 1.103534e+01, β[10] 3.919011e+01, σ 4.440673e-01
iter 11, α[11] 1.388096e+02, β[11] 5.896185e+00, σ 2.817613e-02
iter 12, α[12] 2.579454e-01, β[12] 1.598403e+00, σ 5.083859e-03
iter 13, α[13] -1.817458e-02, β[13] 1.567401e+00, σ 8.086115e-04
iter 14, α[14] -2.814222e-02, β[14] 1.568095e+00, σ 1.167917e-04
iter 15, α[15] -1.184318e-02, β[15] 1.563882e+00, σ 1.539143e-05
iter 16, α[16] 2.292472e-02, β[16] 1.547416e+00, σ 1.849834e-06
iter 17, α[17] -2.205499e-02, β[17] 1.583346e+00, σ 2.109877e-07
iter 18, α[18] 1.480475e-02, β[18] 1.672977e+00, σ 2.369455e-08
iter 19, α[19] 1.511450e+01, β[19] 4.513727e+01, σ 5.071813e-08
iter 20, α[20] 1.348318e+02, β[20] 4.894348e+00, σ 2.187540e-09
Krylov subspace size: 20
Benchmarking Lanczos expv!
  173.878 ms (12255 allocations: 69.22 MiB)
Benchmarking full exponentiation
  25.138 s (12128 allocations: 4.69 GiB)
1.293094021783796e-10

codecov · 2018-10-02T10:17:11Z

Codecov Report

Merging #6 into master will decrease coverage by 0.01%.
The diff coverage is 96.42%.

@@            Coverage Diff             @@
##           master       #6      +/-   ##
==========================================
- Coverage   96.87%   96.86%   -0.02%     
==========================================
  Files           4        6       +2     
  Lines         192      223      +31     
==========================================
+ Hits          186      216      +30     
- Misses          6        7       +1

Impacted Files	Coverage Δ
src/krylov_phiv_adaptive.jl	`98.73% <100%> (ø)`	⬆️
src/arnoldi.jl	`95.55% <100%> (+1.8%)`	⬆️
src/stegr_cache.jl	`100% <100%> (ø)`
src/krylov_expv.jl	`90% <90%> (ø)`
src/krylov_phiv.jl	`92.1% <90%> (-1.23%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d40ff06...e340a4b. Read the comment docs.

ChrisRackauckas · 2018-10-05T19:14:57Z

Summary

basically, this alternative method sounds interesting, we should definitely have it in ExponentialUtilities, it might not work with the higher order semilinear and first-order ODE exponential integrators, but it might be something we want to interface with in the splitting methods, so we should try to keep the interface the same as much as possible, but that might not be possible since it's adapting the Krylov subspace size

MSeeker1340 · 2018-10-06T01:56:29Z

src/krylov_expv.jl

+    if m > Ks.maxiter
+        resize!(Ks, m)
+    else
+        Ks.m = m # might change if happy-breakdown occurs


happy-breakdown occurs -> convergence criterion met

MSeeker1340 · 2018-10-06T01:59:53Z

but that might not be possible since it's adapting the Krylov subspace size

To be more specific, the issue lies with extending this alternative method to phiv_timestep because of the adaptation. On the other hand, for phiv! we can probably apply similar changes as the error estimate for exp should more or less still work for higher order phis, especially if the Krylov dimension is large compared to the phi order.

MSeeker1340 · 2018-10-06T02:07:31Z

Nicely done. To complete things, can you add a keyword argument to expv in krylov_phiv.jl that allows it to use the new version (this should also make your testing script simpler) and add a one-liner comment to krylov_expv.jl describing its purpose?

jagot · 2018-10-06T09:25:48Z

I have addressed the comments above (I believe). There still remains the issue of allocation, but it is not huge, so can maybe rest for now. A similar approach for Arnoldi is possible using hseqr, which computes the Schur factorization of a Hessenberg matrix. What then remains, is the computation of the matrix exponential of an upper triangular matrix.

jagot · 2018-10-06T09:29:12Z

References for upper triangular matrix exponentials:

jagot mentioned this pull request Oct 2, 2018

WIP: Some improvements for Hermitian operators. #1

Closed

6 tasks

jagot added 4 commits October 4, 2018 23:48

Split out the Krylov iteration steps into subroutines

1dac513

Vector views for alphas and betas in Lanczos iteration

3e05e22

Removed unnecessary cache vector in Krylov iteration

3d06fe0

Implemented alternative interface to expv! for Hermitian matrices

1ca5c67

jagot force-pushed the krylov-steps branch from 0345e3f to 1ca5c67 Compare October 4, 2018 21:56

BlasInt may be an Int32 on some machines

115ca25

MSeeker1340 reviewed Oct 6, 2018

View reviewed changes

jagot added 3 commits October 6, 2018 09:10

Comments/docstrings

fef166e

Driver variant of the alternative interface for expv!

a37ecff

Removed remnants of unused cache vector

e340a4b

MSeeker1340 changed the title ~~WIP: Alternative interface to expv!~~ Alternative interface to expv! Oct 6, 2018

MSeeker1340 merged commit 16ed363 into SciML:master Oct 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative interface to expv! #6

Alternative interface to expv! #6

jagot commented Oct 2, 2018

codecov bot commented Oct 2, 2018 •

edited

Loading

ChrisRackauckas commented Oct 5, 2018

MSeeker1340 Oct 6, 2018

MSeeker1340 commented Oct 6, 2018

MSeeker1340 commented Oct 6, 2018

jagot commented Oct 6, 2018

jagot commented Oct 6, 2018

Alternative interface to expv! #6

Alternative interface to expv! #6

Conversation

jagot commented Oct 2, 2018

codecov bot commented Oct 2, 2018 • edited Loading

Codecov Report

ChrisRackauckas commented Oct 5, 2018

MSeeker1340 Oct 6, 2018

Choose a reason for hiding this comment

MSeeker1340 commented Oct 6, 2018

MSeeker1340 commented Oct 6, 2018

jagot commented Oct 6, 2018

jagot commented Oct 6, 2018

codecov bot commented Oct 2, 2018 •

edited

Loading