-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompilation in the SnoopPrecompile/Julia 1.9+ world #2226
Comments
Thanks ! I would suggest a mix of the first and third option (but without the weakdeps). JuMP methodsMethods with MOI methodsMethods with optimizer = MOI.instantiate(GLPK.Optimizer; with_bridge_type = Float64)
cache = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
moi_backend = MOI.Utilities.CachingOptimizer(cache, optimizer) and precompiles the methods JuMP uses, e.g., by running some of the tests in |
Thanks! I have a starter PR in jump-dev/JuMP.jl#3193. Feel free to add to it or suggest improvements. |
My current question is "what else needs to be added after jump-dev/JuMP.jl#3193?" Below I pose some questions that I can't really answer myself, I'm hoping that others can. But more generally, since I won't have time or expertise to improve everything myself, it's probably best to show you how I go about this kind of analysis. So I do the following: using JuMP, GLPK
using SnoopCompileCore
tinf = @snoopi_deep @eval begin
whatever_example_I_want_to_be_fast
end;
using SnoopCompile
using ProfileView
ProfileView.view(flamegraph(tinf)) For this case (and again, already exploiting jump-dev/JuMP.jl#3193) I get a graph like this: This page describes the elements & interaction you can use, but briefly:
Some interesting findings:
|
There's a trade-off here that I played around with a while ago. For small problems, it doesn't matter. But we also want to support building problems with 10^6+ variables and constraints, and then it really pays to add the explicit
Yeah, these ones could be changed.
The right place to add precompilation for these sorts of thing is either in MOI, or in the solvers.
My preference is not to add any weakdeps to the JuMP ecosystem just yet, until a few Julia releases have happened and we can assess how it works. Maintaining them in JuMP for a bunch of solvers seems like a bit of work, especially if we can get the TTFX down via other approaches. |
Okay, question. How should I setup precompile statements for solvers which set a global constant in For example: Is it okay to do something like this? import SnoopPrecompile
SnoopPrecompile.@precompile_setup begin
SnoopPrecompile.@precompile_all_calls begin
__init__()
model = MOI.instantiate(HiGHS.Optimizer; with_bridge_type = Float64)
end
end |
I have it working for HiGHS: jump-dev/HiGHS.jl#147. There's just one problem left: julia> using JuMP, HiGHS
julia> using SnoopCompileCore
julia> tinf = @snoopi_deep @eval begin
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Pr: 2(220) 0s
2 2.0500000000e+02 Pr: 0(0) 0s
Model status : Optimal
Simplex iterations: 2
Objective value : 2.0500000000e+02
HiGHS run time : 0.00
julia> using SnoopCompile
julia> using ProfileView
julia> ProfileView.view(flamegraph(tinf)) The red bars are because of MathOptInterface.jl/src/Utilities/copy.jl Lines 487 to 490 in 40b81c5
which calls MathOptInterface.jl/src/Utilities/copy.jl Lines 165 to 173 in 40b81c5
but because
despite the fact that this method doesn't exist and won't get called at runtime. We also can't annotate the type, and adding mixtures of precompile directives to MOI and HiGHS didn't seem to fix the problem. Any ideas on how to resolve? |
I'm about to add a PR to MOI that gets it down to: Nice progress! It brings HiGHS down to julia> @time @eval begin
let
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end
end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Pr: 2(220) 0s
2 2.0500000000e+02 Pr: 0(0) 0s
Model status : Optimal
Simplex iterations: 2
Objective value : 2.0500000000e+02
HiGHS run time : 0.00
0.689926 seconds (465.37 k allocations: 31.314 MiB, 98.62% compilation time) (Considering we started at ~6seconds before, and even more before you made the change to JuMP.) |
With jump-dev/JuMP.jl#3195, the graph is now |
The and it gets us to less than 0.5s: julia> @time @eval begin
let
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end
end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Pr: 2(220) 0s
2 2.0500000000e+02 Pr: 0(0) 0s
Model status : Optimal
Simplex iterations: 2
Objective value : 2.0500000000e+02
HiGHS run time : 0.00
0.483362 seconds (309.34 k allocations: 20.749 MiB, 98.07% compilation time) So I'm close to calling this a win. The left red tower is the |
Latest change to HiGHS drops to julia> @time @eval begin
let
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end
end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Pr: 2(220) 0s
2 2.0500000000e+02 Pr: 0(0) 0s
Model status : Optimal
Simplex iterations: 2
Objective value : 2.0500000000e+02
HiGHS run time : 0.00
0.397762 seconds (259.76 k allocations: 17.405 MiB, 97.57% compilation time) |
This is a very nice win. Thanks for all of the work you've put in. This should make a massive improvement to the ecosystem as it gets rolled out. On currently released versions:julia> @time using JuMP, HiGHS
7.565047 seconds (8.91 M allocations: 581.336 MiB, 3.27% gc time, 0.20% compilation time)
julia> @time @eval begin
let
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end
end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Pr: 2(220) 0s
2 2.0500000000e+02 Pr: 0(0) 0s
Model status : Optimal
Simplex iterations: 2
Objective value : 2.0500000000e+02
HiGHS run time : 0.00
9.958522 seconds (11.84 M allocations: 1.506 GiB, 5.49% gc time, 99.82% compilation time) With master of JuMP, MOI, and HiGHSjulia> @time using JuMP, HiGHS
8.848867 seconds (10.13 M allocations: 655.850 MiB, 3.44% gc time, 0.20% compilation time)
julia> @time @eval begin
let
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end
end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Pr: 2(220) 0s
2 2.0500000000e+02 Pr: 0(0) 0s
Model status : Optimal
Simplex iterations: 2
Objective value : 2.0500000000e+02
HiGHS run time : 0.00
0.412142 seconds (259.70 k allocations: 17.417 MiB, 97.80% compilation time) |
Awesome! |
incredible! |
Great work @odow! I am very grateful that you've taken the time to master these issues and combine it with your expertise with the JuMP ecosystem.
If there's something not reproducible (e.g., that global constant involves a pointer), then as I'm sure you know, you'd need to reset it when You probably know this already, but the right mental picture is that your package's source code ( |
I couldn't make the last red bar go away, even with explicit calls. Is the problem that there are still some invalidations? The gist of the problem is something like: abstract type A end
struct B <: A end
foo(::Type{B}) = B()
struct C <: A end
foo(::Type{C}) = C()
bar() = Type[B, C]
baz() = Any[foo(T) for T in bar()] In the loop, Are there any techniques for dealing with these kinds of problems? If you run this script the using JuMP, HiGHS
using SnoopCompileCore
tinf = @snoopi_deep @eval begin
model = Model(HiGHS.Optimizer)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 100)
@constraint(model, c2, 7x + 12y >= 120)
optimize!(model)
end;
using SnoopCompile
using ProfileView
ProfileView.view(flamegraph(tinf)) |
Is this the Recompilation of an abstract call does not seem terribly uncommon, and I don't yet know what to do about it. A couple of issues are:
If you want to be able to check whether a MethodInstance appears in the cachefile, you can use https://github.com/timholy/PkgCacheInspector.jl and do something like this: using PkgCacheInspector, MethodAnalysis
cf = info_cachefile("MathOptInterface")
mis = methodinstances(cf);
using MathOptInterface
using JuMP, HiGHS
... # this is your workload above and then click on a ProfileView bar, followed by mi = ProfileView.clicked[].linfo # get the MethodInstance corresponding to the most recently-clicked bar
mi ∈ mis |
So using function _precompile_()
ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
Base.precompile(Tuple{typeof(setindex!),IndexMap,MathOptInterface.ConstraintIndex{MathOptInterface.VectorOfVariables, S},MathOptInterface.ConstraintIndex{MathOptInterface.VectorOfVariables, S}}) # time: 0.014820462
end That corresponds to the top-right green bar in the flame graph. However, it's invalid code because |
Bug |
I thought it'd be useful to put some concrete numbers on where things stand. Small increase in
Now under 1second for run-time, which feels a loooot snappier. We can continue to improve things. MutableArithmetics' The main downsides are that we need to maintain the precompile models in each solver, but that uses only the public API so it's not a heavy lift going forward. And much longer precompile times. The latter is particularly annoying for me developing because I need to precompile all the time, but the win is worth it for the reduction in TTFX. No more... Let me just wait for that to compile... in talks/tutorials etc. This is also orthogonal to the PkgCompiler stuff, so if you use compile an image you can get rid of the Before
After
Code@time using JuMP, HiGHS
@time @eval begin
let
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@objective(model, Min, 12x + 20y)
@constraint(model, c1, 6x + 8y >= 1)
@constraint(model, c2, 7x + 12y <= 120)
optimize!(model)
end
end; @time using JuMP, Ipopt
@time @eval begin
let
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x >= 0)
@variable(model, 0 <= y <= 3)
@NLobjective(model, Min, (12x + 20y)^2)
@constraint(model, c1, 6x + 8y >= 1)
@constraint(model, c2, (7x + 12y)^2 <= 120)
optimize!(model)
end
end; |
Awesome work!
This is a genuine issue. SnoopPrecompile allows you to specify, locally, that you want to skip the workload for certain packages: see the final few lines of https://timholy.github.io/SnoopCompile.jl/stable/snoop_pc/. The increased precompilation time on 1.9 is mostly due to the fact that we need to do LLVM codegen twice (for complicated reasons), but if you don't run the workload then there probably isn't very much compiled code in the package. So disabling the workload for specific packages should reduce compile times to at or perhaps below 1.8-levels prior to the addition of this feature to SnoopPrecompile.
Glad that you envision these kinds of benefits! If you use the SnoopPrecompile/Preferences trick above, just make sure to change it back and launch |
We could make the key per package instead of having it be in SnoopPrecompile? But also makes it more annoying to set. |
Yeah, I'm not exactly sure how that would work. You'd need a name per package, right? I guess we could automate the name selection based on |
Either
Which would then allow the user to set a preference flag per Package. |
Maybe I'm misunderstanding your proposal, but is that different from and ? The issue is that PkgA depends on SnoopPrecompile, but SnoopPrecompile depends on the compile-time preference Or are you saying that each package would have a |
Yes. IIUC the issue is that changing the preference causes the entirety of the pkg stack to be recached. Whereas if you hoist the preference to each individual package you could set it only for the current dev-ed packages. |
For two reasons:
In practice 2 might not be that serious because I suspect it will mostly be developers who want to disable this. But problem 1 remains. Here's a demo: module TestPrefs
using Preferences
const should_precompile = @load_preference("should_precompile", true)
end tim@diva:~/.julia/dev/TestPrefs$ cat LocalPreferences.toml
[TestPrefs]
should_precompile = false
tim@diva:~/.julia/dev/TestPrefs$ julia --project
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.0-beta3 (2023-01-18)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using TestPrefs
[ Info: Precompiling TestPrefs [51b07c64-1e5a-4f05-bdfd-3a9e5596e73a]
julia> TestPrefs.should_precompile
false
julia>
tim@diva:~/.julia/dev/TestPrefs$ cd ~
tim@diva:~$ julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.0-beta3 (2023-01-18)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(@v1.9) pkg> dev TestPrefs
Resolving package versions...
Updating `~/.julia/environments/v1.9/Project.toml`
[51b07c64] + TestPrefs v0.1.0 `~/.julia/dev/TestPrefs`
Updating `~/.julia/environments/v1.9/Manifest.toml`
[51b07c64] + TestPrefs v0.1.0 `~/.julia/dev/TestPrefs`
julia> using TestPrefs
julia> TestPrefs.should_precompile
true AFAICT the fact that loading of preferences is path-specific is a deliberate design decision, not a bug. |
It depends on your project/load path, not the path you launch Julia from.
You cannot use the LocalPreferences.jl for a non-deved packages since that directory is supposed to be immutable and might lay on a read-only directory. You can use a global entry in the load path to set preferences across sessions. The same issue applies to the current solution though. |
True. I guess we can solve this if we have SnoopPrecompile handle all the mangling: using Foo, SnoopPrecompile
full_precompile(Foo, false) could write a variable |
@odow I see you have implemented this for Ipopt.jl and HiGHS.jl. Is Gurobi.jl still on the todo list? |
I guess we'll roll this out for all the solvers. I've been holding off until 1.9 is actually released |
Fair enough, I just ask cause it might be useful for the performance paper we are putting together. |
If you're running benchmarks, I'd encourage you to use a sys image like we did here: It's also fair to report the with- and without sys image results. Even on 1.9, there's still a little bit of latency with the new compilation. |
Right, I was hoping to make the without sys image results more compelling. But it can wait for 1.9 (we can add it with the next round of revisions). |
Yeah I think it's fair to use Julia v1.6 with an without the sys image, and then to say that the upcoming release of Julia v1.9 will remove most of the difference in performance, but that you don't include results because it wasn't released at the time of writing the paper. It's probably also fair to benchmark python with/without pypy. Happy to read over a draft if you want feedback. |
I'm moving this issue to MOI. I think we're pretty solid at the JuMP level, but MOI could provide some tooling to make it easier for solvers to precompile themselves. |
Closing because it isn't hard for packages to add a precompile block. Here are two examples:
It's a bit hard for MOI to offer one that is tailored to each solver. Recent Julia versions have made such great progress (thanks all), this this is much less of a problem now:
We still can't get the last bit of the JuMP-MOI-HiGHS connection precompiled, but it's not a big deal. |
Because of its popularity, I'd like to use JuMP as one of several "showcases" for the impact of pkgimages in Julia 1.9 (CC @vchuravy, @vtjnash, @KristofferC). It turns out that to really showcase this work, JuMP might need a few tweaks. Normally I just submit PRs, but due to the "orthogonality" of solvers, JuMP presents some interesting challenges, and so I decided to instead open this issue.
JuMP and its ecosystem have had a lot of nice work done on invalidations and precompilation already, and these lay the foundation and make everything I'm about to show much easier. (Thanks!) The main remaining gap is due, I think, to the fact that the precompilation work occurred before the arrival of SnoopPrecompile and pkgimages in Julia 1.9.
Let me begin by showing that there's opportunity for substantial further improvement. All tests were conducted on a reasonably up-to-date Julia
master
:Now, let me create a new package,
StartupJuMP
, purely for the purpose of extra precompilation. (That's a viable strategy on Julia 1.8 and higher, with the big impact arriving in Julia 1.9.) The source code is a single file,src/StartupJuMP.jl
, with contents:Now:
You can see a decrease in load time, which TBH I find puzzling (I expected a modest increase). But more importantly, you see a massive decrease in time-to-first-execution (TTFX), to the point where TTFX just doesn't feel like a problem anymore. And keep in mind that this is on top of all the nice work the JuMP ecosystem has already done to minimize TTFX: this small SnoopPrecompile workload improves the quality of precompilation substantially.
Now, ordinarily I'd just suggest putting that
@precompile_all_calls
block in JuMP. However, a big issue is that this precompilation workload relies on GLPK, and I'm guessing you don't want to precompile GLPK "into" JuMP. I'm not sufficiently familiar with JuMP to pick a good alternative, but in very rough terms I imagine there are at least three potential paths, which might be used alone or in combination:optimize!
might returnnothing
, for example). This might (?) precompile a lot of the machinery.kwfunc
s. The advantage of SnoopPrecompile is that you write everything in terms of the public interface and dispatch will generate the right precompiles on each different version of Julia.StartupJuMP
for each solver. The advantage of weakdeps compared to similar solutions like Requires.jl is that the "extension packages" get precompiled and cached.I'm happy to help, but given the issues I think it would be ideal if a JuMP developer helped choose the approach and shepherd the changes through.
The text was updated successfully, but these errors were encountered: