Skip to content
This repository has been archived by the owner on Sep 9, 2024. It is now read-only.

WIP: Generalize oderkf to eventually replace ode23 #16

Merged
merged 7 commits into from
May 6, 2014

Conversation

acroy
Copy link
Contributor

@acroy acroy commented Mar 2, 2014

The function oderkf is extended to support embedded RK methods with order!=5. Using the Butcher tableau for the Bogacki–Shampine method we can eventually replace the current ode23. This is WIP since I would like to discuss some optimizations and also wait for #13 to be merged.

To monitor potential performance regressions/improvements I have added a very basic "performance test" in test/perf_test.jl, which currently gives

testing method ode23
elapsed time: 0.173369477 seconds (28322912 bytes allocated)
*  number of steps: 307
*  minimal step: 0.017235477520255074
*  maximal step: 0.13251841618852822
*  maximal deviation from known solution: 0.00024150339669093412

testing method ode23_bs
elapsed time: 0.143952278 seconds (25093360 bytes allocated)
*  number of steps: 284
*  minimal step: 0.04058097707789443
*  maximal step: 0.16972148358466033
*  maximal deviation from known solution: 0.00029454047976829045

testing method ode45_dp
elapsed time: 0.02042501 seconds (5396480 bytes allocated)
*  number of steps: 50
*  minimal step: 0.2
*  maximal step: 0.6237898397680723
*  maximal deviation from known solution: 1.521321931230446e-5

testing method ode45_fb
elapsed time: 0.023933644 seconds (5487280 bytes allocated)
*  number of steps: 56
*  minimal step: 0.2
*  maximal step: 0.518603647973384
*  maximal deviation from known solution: 0.00035154683318205926

testing method ode45_ck
elapsed time: 0.01847891 seconds (4496800 bytes allocated)
*  number of steps: 48
*  minimal step: 0.2
*  maximal step: 0.6213739278959354
*  maximal deviation from known solution: 0.00013938517428924158

Note that ode23 and ode23_bs (based on oderkf) currently cannot be compared, because of different tolerances.

Comments and/or suggestions are most welcome.

@acroy
Copy link
Contributor Author

acroy commented Mar 4, 2014

With fdff7e1 the tolerances in ode23 and oderkf are the same. The performance test gives

testing method ode23
elapsed time: 0.163701603 seconds (28323256 bytes allocated)
*  number of steps: 307
*  minimal step: 0.017235477520255074
*  maximal step: 0.13251841618852822
*  maximal deviation from known solution: 0.00024150339669093412

testing method ode23_bs
elapsed time: 0.14022683 seconds (27939032 bytes allocated)
*  number of steps: 309
*  minimal step: 0.030318943000676057
*  maximal step: 0.1333486562373487
*  maximal deviation from known solution: 0.00024434914654758444

testing method ode45_dp
elapsed time: 0.020355692 seconds (5402632 bytes allocated)
*  number of steps: 52
*  minimal step: 0.14907708038021283
*  maximal step: 0.5277942154452617
*  maximal deviation from known solution: 1.4358013593174235e-5

testing method ode45_fb
elapsed time: 0.077006776 seconds (5474392 bytes allocated)
*  number of steps: 57
*  minimal step: 0.2
*  maximal step: 0.6019190655037896
*  maximal deviation from known solution: 0.0002961599022262007

testing method ode45_ck
elapsed time: 0.018243295 seconds (4511032 bytes allocated)
*  number of steps: 50
*  minimal step: 0.2
*  maximal step: 0.6412708946202219
*  maximal deviation from known solution: 0.00015482199540262087

The remaining difference between ode23 and ode23_bs, 307 vs 309 steps, is probably due to the different setting/handling of hmax, hmin and hinitial.

@tomasaschan
Copy link
Contributor

I don't really see a reason not to merge this. Are we waiting for anything else before we want this in? (Perhaps tagging v0.1?)

@acroy
Copy link
Contributor Author

acroy commented Mar 4, 2014

The prospect of having only one function for all embedded RK pairs is indeed really promising and will make our life probably easier. We should discuss if we want to remove the current ode23 in this PR and if so, which implementation details we want to keep for oderkf (e.g. choice of initial step size)? Another question would be if there are any optimizations we can/want to make now? We might also want to wait for JuliaLang/julia#14?

@acroy
Copy link
Contributor Author

acroy commented Mar 4, 2014

Regarding the optimizations I was wondering to which extent the fact that the Butcher tableau is const is taken into account by the JIT or if some code generation might give better performance?

@acroy
Copy link
Contributor Author

acroy commented Mar 12, 2014

Ok, after merging JuliaLang/julia#14 the performance test gives

testing method ode23
elapsed time: 0.097011647 seconds (13009712 bytes allocated)
*  number of steps: 307
*  minimal step: 0.017235477520255074
*  maximal step: 0.13251841618852822
*  maximal deviation from known solution: 0.00024150339669093412

testing method ode23_bs
elapsed time: 0.065618373 seconds (14071760 bytes allocated)
*  number of steps: 309
*  minimal step: 0.030318943000676057
*  maximal step: 0.1333486562373487
*  maximal deviation from known solution: 0.00024434914654758444

testing method ode45_dp
elapsed time: 0.016598553 seconds (2584720 bytes allocated)
*  number of steps: 52
*  minimal step: 0.14907708038021283
*  maximal step: 0.5277942154452617
*  maximal deviation from known solution: 1.4358013593174235e-5

testing method ode45_fb
elapsed time: 0.013900736 seconds (2633600 bytes allocated)
*  number of steps: 57
*  minimal step: 0.2
*  maximal step: 0.6019190655037896
*  maximal deviation from known solution: 0.0002961599022262007

testing method ode45_ck
elapsed time: 0.047821993 seconds (2200720 bytes allocated)
*  number of steps: 50
*  minimal step: 0.2
*  maximal step: 0.6412708946202219
*  maximal deviation from known solution: 0.00015482199540262087

which is Ok. However, using a different test, which involves a system of 3 equations (problem B5 in DETEST), I get

testing method ode23
elapsed time: 0.077103019 seconds (17847680 bytes allocated)
*  number of steps: 238
*  minimal step: 1.7235477520255075e-5
*  maximal step: 0.13058852624073225

testing method ode23_bs
elapsed time: 7.753311878 seconds (5651004400 bytes allocated)
*  number of steps: 172
*  minimal step: 0.00078395904880324
*  maximal step: 0.18195874962795777

(the solutions seem to be ok, e.g. norm(y23[end]-y23_bs[end])=4.3381957391576563e-5).

@acroy
Copy link
Contributor Author

acroy commented Mar 12, 2014

This is better. I replaced the generic_matvecmul in oderkf, which avoids JuliaLang/LinearAlgebra.jl#94.
The second test now gives

== problem B5 in DETEST ==

testing method ode23
elapsed time: 0.075416864 seconds (17847680 bytes allocated)
*  number of steps: 238
*  minimal step: 1.7235477520255075e-5
*  maximal step: 0.13058852624073225

testing method ode23_bs
elapsed time: 0.034935726 seconds (11513200 bytes allocated)
*  number of steps: 172
*  minimal step: 0.0007839590369780325
*  maximal step: 0.18195874964159842

@acroy
Copy link
Contributor Author

acroy commented Mar 13, 2014

Deprecation warnings no more ...

@acroy
Copy link
Contributor Author

acroy commented Mar 14, 2014

I think it is almost time for ode23 to retire. More optimizations could be done separately. It would be good if someone could review this PR though. Then we have to decide if we want to remove ode23 completely now (and use ode23=ode23_bs instead) and if we want to keep the "performance test"?

@tomasaschan
Copy link
Contributor

This looks good to me. And since ode23_bs even seems to outperform ode23, I think we're ready to retire the old implementation.

Re. perf tests, there has been some discussion before, but I don't know if any real conclusions on how to handle perf testing of packages have been reached. I think they have value, in the very least as some kind of safeguard against performance regressions, but that also means we need a way to automate not only running them, but analyzing the results. I have no idea if there is any infrastructure we could hook into there, or if we have to invent something. However, I think these perf tests do test the right things (albeit maybe not as extensively as we might want in the long run) so they're a good start and I don't think we should throw them away.

@tomasaschan
Copy link
Contributor

By the way, this takes care of JuliaLang/julia#2 as well, right?

@pao
Copy link
Contributor

pao commented Mar 15, 2014

I have no idea if there is any infrastructure we could hook into there, or if we have to invent something.

You might check with @staticfloat to see if you can use the http://speed.julialang.org/ infrastructure (n.b., site is down right now, I've notified him.)

@acroy
Copy link
Contributor Author

acroy commented Mar 15, 2014

By the way, this takes care of JuliaLang/julia#2 as well, right?

Partly, support for intermediate points in tspan are still missing. I think we can close JuliaLang/julia#2 anyhow, since this is now included in the API specification.

The performance test is IMO kind of crude, but it should at least give a rough idea about performance regressions. We can easily add others test cases from DETEST as well. I also have a tweaked version which saves the results to an HDF5 file. One can then generate plots comparing different methods... In the long run this needs someone who makes the tests more reliable and extends them in a sensible way.

@tomasaschan
Copy link
Contributor

@acroy Take a look at JuliaLang/julia#3 for a (very crude) idea on how to design a slightly more robust test framework for this. It's just a start, and it's pretty old, so I doubt that much code from there will be usable (if any at all) but at least it can probably serve up some ideas on how to improve testing.

@tomasaschan tomasaschan mentioned this pull request Mar 17, 2014
@acroy
Copy link
Contributor Author

acroy commented Mar 17, 2014

@tlycken : Actually, I had seen JuliaLang/julia#3 and I think using types for different kind of test problems is the way to go. Here I just wanted to have some means to detect performance regressions, so I did't investigate this further. My tests have the problem that the timings are still fluctuating somewhat from run to run. I guess I have to make sure that the solver is called sufficiently often to stabilize the timing...

@tomasaschan
Copy link
Contributor

To stabilize timing, maybe just run the tests ~10 times per problem/solver combination? This will of course make the test run take (much) longer, but I don't think the perf tests are supposed to be run in every test context anyway.

@acroy
Copy link
Contributor Author

acroy commented Mar 17, 2014

That is what I am doing now. Each combination of problem and solver runs 10 times plus 1 for JITing. I should probably put the innermost part in a function though.

@ivarne
Copy link
Contributor

ivarne commented Mar 17, 2014

Shouldn't the performance tests just wrap the normal tests with proper timing procedures. The only difference is that we want to do multiple iterations, and try to do something intelligent/conscious about GC and JIT time consumption.

@acroy
Copy link
Contributor Author

acroy commented Mar 17, 2014

Well, our current tests were rather ad-hoc additions (see JuliaLang/julia#1). Of course they can tell us if our implementation is basically Ok, but they don't probe all aspects of a solver. For example, all our tests are linear ODEs, which are in a way too simple. We also need test problems for which step-size estimation fails in come cases and so on. In order to compare different methods or runs we IMO need additional information like number of integration steps, smallest and largest time-step, ... This is a (GSOC) project on its own, but for now we could certainly merge tests.jl and perf_test.jl?

Anyways, "something intelligent/conscious about GC and JIT time consumption" would be good :-)

@tomasaschan
Copy link
Contributor

I am actually against merging tests.jl and perf_tests.jl - it'll be a lot easier to decide to run the tests in different contexts if/when we have the infrastructure to automate performance testing if the code is separated. Also, if Pkg starts running tests on install, we don't want the user to have to wait for our performance tests...

@ivarne
Copy link
Contributor

ivarne commented Mar 18, 2014

My point was not to merge the performance and correctness tests. My suggestion was to somehow make the normal tests run 1 iteration of the performance tests (or to make the performance tests run multiple iterations of some of the examples in the correctness tests).

@acroy
Copy link
Contributor Author

acroy commented Mar 18, 2014

This is what I basically meant by "merging". For the correctness test we probably want to have only problems with a known solution and which don't run forever, i.e., a subset of all tests. The problems are now in a tuple, which could be filtered according to the hassol flag, or one just makes two lists. If the test-running in perf_test is put into a function with an argument indicating the number of iterations, we could use the same code for the normal and the simple performance tests.

@tomasaschan
Copy link
Contributor

OK, I misunderstood you both. Running the same test problems when testing for correctness and for performance seems like a good idea.

@acroy acroy mentioned this pull request Mar 20, 2014
@tomasaschan tomasaschan mentioned this pull request Mar 21, 2014
5 tasks
@acroy
Copy link
Contributor Author

acroy commented May 5, 2014

To make some progress I would suggest the following:

  • I have removed the "performance tests" for now. In case someone cares the file is here. If I have time I will post them as a separate PR.
  • I left the old ode23 in place and added ode23_bs based on the new oderkf.

I think the testing should be done properly and deserves its own PR. In principle we could remove ode23 and continue with oderkf, but it might be useful to keep it for comparison (at least for a while). Anyways, this PR gets rid of the deprecation warnings associated with JuliaLang/LinearAlgebra.jl#94 and should be sufficient to close JuliaLang/julia#29. It would be great if someone could have a look ...

@tomasaschan
Copy link
Contributor

Great! Merging this will have implications on a bunch of other stuff as well, since it's API-breaking at least for the internal API (and I'm saying this as a good thing - we know we want to do this, and when we've done it it's easier to move forward with other stuff).

I haven't done more elaborate testing than running the test suite and having a staring contest with the code to see if anything looks suspicious, but I don't think there are any problems with this. One thing that I did notice and that we might want to consider (but I don't think it's necessary - just want to throw it out there so it's not overlooked): The old ode23 currently has more sophisticated guesses than oderkf both for minimal step size (here vs here) and initial step (here vs here). It might be insignificant, but we might want to "steal" the better version for oderkf eventually.

@tomasaschan
Copy link
Contributor

Also, I did find a way to squeeze out some more performance out of oderkf, but I can fix that in a separate PR once this is merged. Basically, it's about using tout = Array(typeof(t), 1) and push!(tout,t) instead of tout = t and tout = [tout; t].

@acroy
Copy link
Contributor Author

acroy commented May 6, 2014

Re hmax, hmin and hinit: This is one of the reasons I wanted to keep ode23 for now. I think we planned to make them keywords anyways, which would be a good opportunity to use the best guess for oderkf.

I guess we could gain even more performance by changing F(t,x) to F!(k,t,x) like in JuliaLang/julia#33?

@tomasaschan
Copy link
Contributor

Possibly, but that would be quite a disruptive change in the external API, which would also probably be quite confusing for new users, since it differs from the way e.g. MATLAB defines its system function. Maybe it's worth investigating if it can be done using a keyword argument to the solver to indicate if it should use F or F!...

Anyway, I think this is good to merge as is.

tomasaschan pushed a commit that referenced this pull request May 6, 2014
WIP: Generalize oderkf to eventually replace ode23
@tomasaschan tomasaschan merged commit 15bc7a1 into SciML:master May 6, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants