Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive explicit extrapolation #714

Merged
merged 31 commits into from
Apr 27, 2019

Conversation

AlthausKonstantin
Copy link

This are two implementations of the adaptive extrapolation of the explicit midpoint rule (Similar to ODEX and DIFEX ).
I developed the original code for my bachelor's thesis and would like to add it now to the JuliaDiffEq ecosystem.

This is the first working version of the algorithms and I would love to get some feedback:
Are my additions ok? Is there something I have to change? Are there some essential options from the common interface I should implement?

Since it is the first time that I contribute to an open source project, I would also be grateful to get some tips, which tests I should implement.

Thanks!

src/alg_utils.jl Outdated Show resolved Hide resolved
@@ -339,16 +346,22 @@ beta2_default(alg::FunctionMap) = 0
beta2_default(alg::DP8) = 0//1
beta2_default(alg::DP5) = 4//100
beta2_default(alg::DP5Threaded) = 4//100
beta2_default(alg::ExtrapolationMidpointDeuflhard) = 0//1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue with PI adaptivity here?

subdividing_sequence::Array{BigInt,1} # subdividing_sequence[n] is used for the (n -1)th internal discretisation
stage_number::Array{Int,1} # Stage_number[n] contains information for extrapolation order (n + alg.n_min - 1)
# Weights and Scaling factors for extrapolation operators
extrapolation_weights::Array{Rational{BigInt},2}
Copy link
Member

@ChrisRackauckas ChrisRackauckas Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be Rational? Maybe it doesn't matter all that much, but Bigs are allocating, so it would be worthwhile to profile if the calculations on these actually end up being costly, and whether just having them as Float64 may be numerically stable enough to get the job done. If Float64 is enough, then typing these to make t or u would make sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't played with Floats here so far.

The good thing about theBigInts is that the weights are converted in perform_step to the type of u. That allows to adapt the precision of the problem.
I'd suspect that if one uses Float64 weights and say aBigFloat representation for the problem, the approximation error will not get any better than eps(Float64).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just converting to u's type here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - the caches are recreated for every call of solve, right? Yes, then the type of u should be an option.
I guess I will set up an experiment to check the stability of the weights!

BTW: Would it be possible to move the weights into the algorithm's struct? That way the computation of the tables for every solve call would be bypassed..

function alg_cache(alg::ExtrapolationMidpointDeuflhard,u,rate_prototype,uEltypeNoUnits,uBottomEltypeNoUnits,tTypeNoUnits,uprev,uprev2,f,t,dt,reltol,p,calck,::Type{Val{false}})
@unpack n_min, n_init, n_max = alg

QType = tTypeNoUnits <: Integer ? typeof(qmin_default(alg)) : tTypeNoUnits # Cf. DiffEqBase.__init in solve.jl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peculiar that the type of this is needed. I'll see where this goes...


# Compute scaling factors for extrapolation operators
extrapolation_scalars = -nodes[1] * [BigInt(1); extrapolation_scalars_2]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the amount of pre-computations is based on n_max, it actually matters in small enough settings. That is something that may be worth remembering to document, or at least test a little bit. BTW, at when n_max does overflow start to occur in the sequences?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will mention this in the documentation.

As far as I've tested the constructor, I have not been able to produce an overflow (for any sequence) if using BigInt. As I replaced BigInt by Int, I got an overflow for the Romberg sequence and n_max = 6

extrapolation_scalars_2::Array{Rational{BigInt},1}
end

function alg_cache(alg::ExtrapolationMidpointHairerWanner,u,rate_prototype,uEltypeNoUnits,uBottomEltypeNoUnits,tTypeNoUnits,uprev,uprev2,f,t,dt,reltol,p,calck,::Type{Val{false}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like some refactoring can make this re-use a lot of code from the previous cache function. That's a second order issue that we can handle later though.

Copy link
Author

@AlthausKonstantin AlthausKonstantin Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Am I "allowed" to define a function producing the weights etc. somewhere? If so where?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do it in another part of this file, and then call it from in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will change that!

cc = alg_cache(alg,u,rate_prototype,uEltypeNoUnits,uBottomEltypeNoUnits,tTypeNoUnits,uprev,uprev2,f,t,dt,reltol,p,calck,Val{false})

ExtrapolationMidpointDeuflhardCache(u_tilde, u_temp1, u_temp2, tmp, T, res, fsalfirst, k,
cc.Q, cc.n_curr, cc.n_old, cc.subdividing_sequence, cc.stage_number, cc.extrapolation_weights, cc.extrapolation_scalars, cc.extrapolation_weights_2, cc.extrapolation_scalars_2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be easier to just stick the cc in here, using composition to improve code re-use. But again with the other comment, optimizing code simplicity and re-use can come in a later PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I decided against that to make the call @unpack subdividing_sequence = cache and such possible for both caches.

Would you prefer an if - else statement before calling @unpack in order to check if a cache is mutable or not?

integrator.opts.beta1 = typeof(integrator.opts.beta1)(1 // (2integrator.cache.n_curr + 1))
integrator.opts.gamma = typeof(integrator.opts.gamma)(1 // 4)^integrator.opts.beta1
# Compute new stepsize scaling
qtmp = integrator.EEst^integrator.opts.beta1 / integrator.opts.gamma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you got rid of beta2. Something that would be worthwhile later would be to try and make this use PI instead of P, since step stabilization will help a method based on explicit RKs take larger steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But hmm, yes anything based on a work estimate is hard to stabilize.

else
# Initialize
@unpack t,EEst = integrator
tol = integrator.opts.internalnorm(integrator.opts.reltol,t) # Deuflhard's approach relies on EEstD ≈ ||relTol||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have weird interactions with element-wise tolerances, but does work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a problem I have not figured out yet. The thing is that in Deuflhard's approach some sort of notion of a relative tolerance (< 1 !) is needed for the convergence monitor and stepsize prediction.

I guess this forces the user in the case of vector valued rel. and abs. tolerance to think of these vectors as scaling vectors multiplied by tol.

So far I haven't played with element-wise tolerances and don't know what the effects are...

@ChrisRackauckas
Copy link
Member

What's the reference you used for the stepping behaviors?

@AlthausKonstantin
Copy link
Author

What's the reference you used for the stepping behaviors?

What do you mean exactly - the literature for the adaptivity?

@ChrisRackauckas
Copy link
Member

What's the literature you used for the implementation? That's what I meant.

@AlthausKonstantin
Copy link
Author

What's the literature you used for the implementation? That's what I meant.

My two main sources have been the book of Hairer, Wanner and Nørsett (Solving Ordinary Differential Equations I)
and Deuflahrd's paper (Order and stepsize control in extrapolation methods).
I also sketched their reasoning - as I understood it and based my code on - in my thesis (I added it to my repository).
Does that help?

@AlthausKonstantin
Copy link
Author

AlthausKonstantin commented Apr 22, 2019

Switch broadcasts to @.. #716

Does this also apply to expressions like broadcast(+, A, B) or only to @.?

@AlthausKonstantin
Copy link
Author

Use a version of the problem where u0 is BigFloat so that way you are using arbitrary precision arithmetic, and test convergence of high orders with that. That is also a good test that the implementation isn't relying on Float64 anywhere.

Thank you!
I adapted the tests. But curiously enough it seems to be vital for my methods that also the time span is given by two BigFloat if the subdividing sequence happens to be different from the Romberg sequence (See the convergence plots in the updated notebook).

Why this behavior depends on the chosen subdividing sequence is not really clear to me...

@ChrisRackauckas
Copy link
Member

Does this also apply to expressions like broadcast(+, A, B) or only to @.?

All if possible. @.. is just a faster form of broadcast since it removes some aliasing assumptions when the broadcast is supposed to match a map.

@ChrisRackauckas
Copy link
Member

I handled the merge conflicts. We may want to just merge the PR once I check out the convergence tests are all good, and then future updates can take place in follow up PRs. It's much easier to keep doing small PRs than it is to keep a branch from going stale.

@codecov
Copy link

codecov bot commented Apr 27, 2019

Codecov Report

Merging #714 into master will decrease coverage by 0.38%.
The diff coverage is 46.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #714      +/-   ##
==========================================
- Coverage   72.01%   71.63%   -0.39%     
==========================================
  Files          93       93              
  Lines       28806    29247     +441     
==========================================
+ Hits        20746    20952     +206     
- Misses       8060     8295     +235
Impacted Files Coverage Δ
src/OrdinaryDiffEq.jl 100% <ø> (ø) ⬆️
src/integrators/controllers.jl 40.38% <0%> (-50.92%) ⬇️
src/caches/extrapolation_caches.jl 95.5% <100%> (+14.55%) ⬆️
src/perform_step/extrapolation_perform_step.jl 61.72% <45.66%> (-34.86%) ⬇️
src/alg_utils.jl 32.93% <50%> (+0.49%) ⬆️
src/algorithms.jl 96.41% <75%> (-1.86%) ⬇️
src/initdt.jl 94.52% <0%> (+1.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b65159...ec087d7. Read the comment docs.

@coveralls
Copy link

coveralls commented Apr 27, 2019

Coverage Status

Coverage increased (+5.5%) to 75.053% when pulling ec087d7 on AlthausKonstantin:explicit_extrapolation into 3b65159 on JuliaDiffEq:master.

@ChrisRackauckas
Copy link
Member

My guess is that something odd happens in a spot like j_int = 2Int64(subdividing_sequence[i+1]), but yeah that bigfloat subdividing sequence issue is a bit hard to track down. I'll open an issue on it. It's not the worst thing in the world since the majority users use Float64, but it would be nice to find out sometime in the future.

sum( broadcast(*, T[1:(i+1)], eltype(uprev).(extrapolation_weights[1:(i+1), (i+1)])) ) that expression allocates a bit and should be changed to using generators instead of broadcast, but that's probably not a big deal so it can be handled in a future PR.

I'll merge if tests pass and we can continue from there. Thanks! This is quite a nice PR, and we will want to continue down the path of https://arxiv.org/abs/1305.6165 . With Julia's new multithreading system this will be pretty exciting.

One contribution you may want to think about is the implicit methods. Since you have the sequences now, swapping out for the implicit parts shouldn't be too bad: just use the nonlinear solvers already in OrdinaryDiffEq. Doing this with the parallelism was attempted in Python, but of course Python got in the way: numerical-mathematics/extrapolation#27 . @YingboMa and I have interesting test cases where BLAS multithreading doesn't speed up the calculation (and in fact slows it down), so taking better control of the parallelism can help on some 50ish ODE very implicit systems.

@ChrisRackauckas
Copy link
Member

This commit introduced a binary and needs to be deleted somehow: 6ddcfcb , otherwise it will sit and clog and repo history.

@ChrisRackauckas ChrisRackauckas force-pushed the explicit_extrapolation branch from f0a3cbf to e080e64 Compare April 27, 2019 14:29
@ChrisRackauckas
Copy link
Member

I cleaned it up and removed the binary from the history. If all went well, then tests will pass again and I'll merge.

@AlthausKonstantin
Copy link
Author

Okay, thank you so much for all the help!
After this merges I'd also like to contribute to the documentation. Should I just open another pull request in the DiffEqDocs with a reference to this PR?
Same for the Benchmarking, should I add this algorithms to the NonStiffODE notebooks?

@ChrisRackauckas
Copy link
Member

After this merges I'd also like to contribute to the documentation. Should I just open another pull request in the DiffEqDocs with a reference to this PR?

Yes

Same for the Benchmarking, should I add this algorithms to the NonStiffODE notebooks?

Everything is moving away from notebooks. We're in the middle of a benchmark overhaul. See SciML/SciMLBenchmarks.jl#32

@ChrisRackauckas
Copy link
Member

ExtrapolationMidpointDeuflhard and ExtrapolationMidpointHairerWanner

ExtrapolationMidpointDeuflhard and ExtrapolationMidpointHairerWanner adapt order
and stepsize. Both extrapolate the explicit midpoint rule and differ only in the
implementation of the adaptive behavior.

Arguments for Initializing the Algorithms

Both methods accept the same set of arguments:

  • min_extrapolation_order and max_extrapolation_order
    restrict the range of values the adaptively controlled extrapolation
    order can attain.
    Since the order control mechanism relies on information from the previous time
    step, the order of extrapolation for the very first timestep is provided externally
    by init_extrapolation_order.

    • ExtrapolationMidpointDeuflhard uses by default an order range of 1:10 and
      starts with init_extrapolation_order = 5.
    • ExtrapolationMidpointHairerWanner works similarly but uses a default range of
      2:10 since it requires
      2 <= min_extrapolation_order <= init_extrapolation_order < max_extrapolation_order.
  • sequence_symbol specifies the step-number sequences, also called the subdividing
    sequence, for both algorithms.
    Possible values for this flag are :harmonic, :romberg or :bulirsch.

To override the default values use keyword arguments like shown above for the
AitkenNevillie algorithm.

If you specify values that do not make sense, e.g.
init_extrapolation_order = 3, min_extrapolation_order = 4 or
sequence_symbol = :foo, the constructor will correct the input and warn you
it did so.

Finally note that the extrapolation order and the order of the method do not
coincide. For both algorithms an extrapolation order of n translates to an error of
of order 2(n+1).

Details on the Implementation

As mentioned above, the explicit midpoint rule is the internal method which is
extrapolated. Since it is not a one step method, it is started with one explicit
Euler step.

The actual extrapolation is not achieved by the
algorithm of Aitken and Neville
but by usage of the First Barycentric Formula (A short overview of Barycentric
Interpolation
is given in this paper of Berrut
and Trefethen).
This means that scalars and weights necessary for carrying out the extrapolation are
precomputed and tabulated once at the beginning of the call of solve.
From there the
actual extrapolation performed in each timestep is achieved by computing a linear
combination of the internal discretizations that have been obtained by the explicit
midpoint rule.
Thus initializing the precomputed values requires time and storage of order
max_extrapolation_order ^ 2 but the time computing the extrapolation for some timestep
scales only linearly in max_extrapolation_order.

Since the algorithms do not use the Aitken Neville scheme for extrapolation, they
also use a different embedded approximation than other extrapolation methods, e.g.
ODEX, do.
Using a notation common for the algorithm of Aitken Neville, let the internal
discretizations be given by T[1:n, 1]. Then ODEX would use the entry T[n,n] as
the new solution and estimate the local error with approximation given by T[n,n-1]
(cf. equation (9.23) in
Solving Ordinary Differential Equations I
of Hairer, Wanner and Nørsett).
Since for the two algorithms presented here only the entries T[i,i] for i = 1:n
are accessible, they use the value T[n-1,n-1] for error estimation instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants