Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompilation in the SnoopPrecompile/Julia 1.9+ world #2226

Closed
timholy opened this issue Jan 19, 2023 · 36 comments
Closed

Precompilation in the SnoopPrecompile/Julia 1.9+ world #2226

timholy opened this issue Jan 19, 2023 · 36 comments

Comments

@timholy
Copy link
Contributor

timholy commented Jan 19, 2023

Because of its popularity, I'd like to use JuMP as one of several "showcases" for the impact of pkgimages in Julia 1.9 (CC @vchuravy, @vtjnash, @KristofferC). It turns out that to really showcase this work, JuMP might need a few tweaks. Normally I just submit PRs, but due to the "orthogonality" of solvers, JuMP presents some interesting challenges, and so I decided to instead open this issue.

JuMP and its ecosystem have had a lot of nice work done on invalidations and precompilation already, and these lay the foundation and make everything I'm about to show much easier. (Thanks!) The main remaining gap is due, I think, to the fact that the precompilation work occurred before the arrival of SnoopPrecompile and pkgimages in Julia 1.9.

Let me begin by showing that there's opportunity for substantial further improvement. All tests were conducted on a reasonably up-to-date Julia master:

julia> @time using JuMP, GLPK
  7.286941 seconds (9.35 M allocations: 604.153 MiB, 4.49% gc time, 0.79% compilation time)

julia> @time @eval begin
               let
                   model = Model(GLPK.Optimizer)
                   @variable(model, x >= 0)
                   @variable(model, 0 <= y <= 3)
                   @objective(model, Min, 12x + 20y)
                   @constraint(model, c1, 6x + 8y >= 100)
                   @constraint(model, c2, 7x + 12y >= 120)
                   optimize!(model)
               end
           end;
  8.465601 seconds (10.10 M allocations: 1.385 GiB, 4.05% gc time, 99.78% compilation time)

Now, let me create a new package, StartupJuMP, purely for the purpose of extra precompilation. (That's a viable strategy on Julia 1.8 and higher, with the big impact arriving in Julia 1.9.) The source code is a single file, src/StartupJuMP.jl, with contents:

module StartupJuMP

using GLPK
using JuMP
using SnoopPrecompile

@precompile_all_calls begin
    # Because lots of the work is done by macros, and macros are expanded
    # at lowering time, not much of this would get precompiled without `@eval`
    @eval begin
        let
            model = Model(GLPK.Optimizer)
            @variable(model, x >= 0)
            @variable(model, 0 <= y <= 3)
            @objective(model, Min, 12x + 20y)
            @constraint(model, c1, 6x + 8y >= 100)
            @constraint(model, c2, 7x + 12y >= 120)
            optimize!(model)
        end
    end
end

end # module StartupJuMP

Now:

julia> @time using JuMP, GLPK, StartupJuMP
  6.297161 seconds (9.80 M allocations: 630.934 MiB, 4.43% gc time, 0.35% compilation time)

julia> @time @eval begin
               let
                   model = Model(GLPK.Optimizer)
                   @variable(model, x >= 0)
                   @variable(model, 0 <= y <= 3)
                   @objective(model, Min, 12x + 20y)
                   @constraint(model, c1, 6x + 8y >= 100)
                   @constraint(model, c2, 7x + 12y >= 120)
                   optimize!(model)
               end
           end;
  0.331432 seconds (154.94 k allocations: 10.237 MiB, 97.93% compilation time)

You can see a decrease in load time, which TBH I find puzzling (I expected a modest increase). But more importantly, you see a massive decrease in time-to-first-execution (TTFX), to the point where TTFX just doesn't feel like a problem anymore. And keep in mind that this is on top of all the nice work the JuMP ecosystem has already done to minimize TTFX: this small SnoopPrecompile workload improves the quality of precompilation substantially.

Now, ordinarily I'd just suggest putting that @precompile_all_calls block in JuMP. However, a big issue is that this precompilation workload relies on GLPK, and I'm guessing you don't want to precompile GLPK "into" JuMP. I'm not sufficiently familiar with JuMP to pick a good alternative, but in very rough terms I imagine there are at least three potential paths, which might be used alone or in combination:

  • run this precompile workload with some kind of default/dummy solver, which doesn't actually perform optimization but at least allows completion (optimize! might return nothing, for example). This might (?) precompile a lot of the machinery.
  • identify the missing precompiles, and add them to the current precompile code. One issue, though, is that this seems a bit more fragile to internal Julia changes than using SnoopPrecompile. For instance, this precompile directive will cause precompilation failure on Julia 1.9, because Julia 1.9 will eliminate kwfuncs. The advantage of SnoopPrecompile is that you write everything in terms of the public interface and dispatch will generate the right precompiles on each different version of Julia.
  • use the upcoming weakdeps infrastructure to create the analog of my StartupJuMP for each solver. The advantage of weakdeps compared to similar solutions like Requires.jl is that the "extension packages" get precompiled and cached.

I'm happy to help, but given the issues I think it would be ideal if a JuMP developer helped choose the approach and shepherd the changes through.

@blegat
Copy link
Member

blegat commented Jan 19, 2023

Thanks ! I would suggest a mix of the first and third option (but without the weakdeps).
The methods compiled are of two types:

JuMP methods

Methods with JuMP.Model and other JuMP types in the signature. These do not depend on the type of the solver since the type JuMP.Model is not parametrized by the solver type so these method should get precompiled by a script with a dummy solver Model(() -> MOI.Utilities.MockOptimizer(MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())).

MOI methods

Methods with MOI.CachingOptimizer{...,MOI.LazyBridgeOptimizer{SolverType}. These are methods called on the MOI backend of the JuMP model for which the signature depends on the SolverType. For these, could build exactly the same MOI model that JuMP builds with:

optimizer = MOI.instantiate(GLPK.Optimizer; with_bridge_type = Float64)
cache = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
moi_backend = MOI.Utilities.CachingOptimizer(cache, optimizer)

and precompiles the methods JuMP uses, e.g., by running some of the tests in MOI.Test. This would miss the JuMP function called on the MOI backend (usually these are the _moi_...) functions. We could get these with the weakdeps infrastructure, I would be interested to know how much of the TTFX is gained by this part, it might only be a small fractions as these are quite simple functions usually.

@timholy
Copy link
Contributor Author

timholy commented Jan 19, 2023

Thanks! I have a starter PR in jump-dev/JuMP.jl#3193. Feel free to add to it or suggest improvements.

@timholy
Copy link
Contributor Author

timholy commented Jan 20, 2023

My current question is "what else needs to be added after jump-dev/JuMP.jl#3193?" Below I pose some questions that I can't really answer myself, I'm hoping that others can. But more generally, since I won't have time or expertise to improve everything myself, it's probably best to show you how I go about this kind of analysis. So I do the following:

using JuMP, GLPK
using SnoopCompileCore
tinf = @snoopi_deep @eval begin
    whatever_example_I_want_to_be_fast
end;
using SnoopCompile
using ProfileView
ProfileView.view(flamegraph(tinf))

For this case (and again, already exploiting jump-dev/JuMP.jl#3193) I get a graph like this:
image

This page describes the elements & interaction you can use, but briefly:

  • time runs horizontally, call-depth vertically
  • each "stack" represents a separate runtime-dispatched entrance into type inference
  • to see the method that was being inferred upon entrance to inference, click on the bottom of bars (left-click prints to REPL, right-click opens in your ENV["EDITOR"])
  • fat bars mean "expensive type inference"
  • empty spaces typically mean LLVM codegen (the wider the gap, the longer LLVM is taking). @snoopi_deep doesn't capture data on what LLVM is doing (though @snoopl can). Usually you can make some guesses about what's being codegened by which inference bars came right before, so I typically don't use @snoopl.
  • red bars indicate a combination of types not available in the package owning the method being optimized. E.g., anything GLPK-dependent will be red. If above the red you get non-red, that indicates an opportunity to precompile everything above it.

Some interesting findings:

  • Early in the trace, MutableArithmetics._rewrite(::Bool, ::Bool, ::Expr, ::Nothing, ::Vector{Any}, ::Vector{Any}, ::Symbol) appears more than once (noteworthy because typically we only need to infer once) and there are yellow bars above it. This indicates that it's being inferred multiple times due to constant-propagation, i.e., Julia is specializing and compiling separate code for, e.g., minus=false versus minus=true. I am guessing you don't want that: while the inference time is small (the bars are skinny), the LLVM time after each of these is nontrivial. You might consider using Base.@constprop :none, or Compat.@constprop :none if you need to support older versions of Julia, on that function. Is that something you want?
  • consider whether you might want to put @nospecialize around some ::Type and ::Function arguments. If the code needs to be specialized for performance reasons, that would be a bad idea, but I'm guessing that much of JuMP is about problem setup (unlikely to be performance sensitive) and not, e.g., about evaluating the objective function as quickly as possible (certainly performance sensitive). But I could easily be wrong (this is a place where expertise with the package is really essential). If they are safe to @nospecialize, as a bonus, then you can precompile them once (from JuMP) with a generic optimizer and not have to worry about specializing on each solver. Some examples of functions that might be candidates showing up in this analysis are Model(optimizer_factory; add_bridges::Bool = true), function instantiate(optimizer_constructor; with_bridge_type::Union{Nothing,Type} = nothing), etc.
  • the "fattest" (most time-consuming) stack of inference bars is MOI.optimize!(m::CachingOptimizer). Presumably that would have to be handled by weakdeps mechanism? Note, though, that eventually a lot of bars reach non-red; e.g., the widest of the non-red bars is for MathOptInterface.get(::MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}, ::MathOptInterface.ListOfConstraintIndices{MathOptInterface.VectorOfVariables}). Does the MockOptimizer not reach these? If so, what can we can add as a workload to MOI to incorporate them?
  • the largest gap (presumably most LLVM time) is right after JuMP._moi_add_variable(::MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{GLPK.Optimizer}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}}, ::Model, ::ScalarVariable{Int64, Float64, Float64, Float64}, ::String). Is that another weakdeps case?

@odow
Copy link
Member

odow commented Jan 22, 2023

but I'm guessing that much of JuMP is about problem setup (unlikely to be performance sensitive) and not, e.g., about evaluating the objective function as quickly as possible (certainly performance sensitive). But I could easily be wrong (this is a place where expertise with the package is really essential)

There's a trade-off here that I played around with a while ago. For small problems, it doesn't matter. But we also want to support building problems with 10^6+ variables and constraints, and then it really pays to add the explicit ::F) where {F<:Function} in parts of MOI.

Some examples of functions that might be candidates showing up in this analysis are

Yeah, these ones could be changed.

the "fattest" (most time-consuming) stack of inference bars is MOI.optimize!(m::CachingOptimizer). Presumably that would have to be handled by weakdeps mechanism?

The right place to add precompilation for these sorts of thing is either in MOI, or in the solvers.

Is that another weakdeps case?

My preference is not to add any weakdeps to the JuMP ecosystem just yet, until a few Julia releases have happened and we can assess how it works. Maintaining them in JuMP for a bunch of solvers seems like a bit of work, especially if we can get the TTFX down via other approaches.

@odow
Copy link
Member

odow commented Jan 22, 2023

Okay, question. How should I setup precompile statements for solvers which set a global constant in __init__?

For example:

https://github.com/jump-dev/HiGHS.jl/blob/9ec05c31c3e73c02661f4de0d0def5b3b664a991/src/HiGHS.jl#L12-L15

Is it okay to do something like this?

import SnoopPrecompile

SnoopPrecompile.@precompile_setup begin
    SnoopPrecompile.@precompile_all_calls begin
        __init__()
        model = MOI.instantiate(HiGHS.Optimizer; with_bridge_type = Float64)
    end
end

@odow
Copy link
Member

odow commented Jan 22, 2023

I have it working for HiGHS: jump-dev/HiGHS.jl#147.

There's just one problem left:

julia> using JuMP, HiGHS

julia> using SnoopCompileCore

julia> tinf = @snoopi_deep @eval begin
         model = Model(HiGHS.Optimizer)
         @variable(model, x >= 0)
         @variable(model, 0 <= y <= 3)
         @objective(model, Min, 12x + 20y)
         @constraint(model, c1, 6x + 8y >= 100)
         @constraint(model, c2, 7x + 12y >= 120)
         optimize!(model)
       end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(220) 0s
          2     2.0500000000e+02 Pr: 0(0) 0s
Model   status      : Optimal
Simplex   iterations: 2
Objective value     :  2.0500000000e+02
HiGHS run time      :          0.00

julia> using SnoopCompile

julia> using ProfileView

julia> ProfileView.view(flamegraph(tinf))

image

The red bars are because of

Any[
_try_constrain_variables_on_creation(dest, src, index_map, S)
for S in sorted_variable_sets_by_cost(dest, src)
]

which calls

function _try_constrain_variables_on_creation(
dest::MOI.ModelLike,
src::MOI.ModelLike,
index_map::IndexMap,
::Type{S},
) where {S<:MOI.AbstractVectorSet}
not_added = MOI.ConstraintIndex{MOI.VectorOfVariables,S}[]
for ci_src in
MOI.get(src, MOI.ListOfConstraintIndices{MOI.VectorOfVariables,S}())

but because sorted_variable_sets_by_cost(dest, src) is a Vector{<:Type}, it fails to infer and tries to compile

/Users/oscar/.julia/dev/MathOptInterface/src/Utilities/universalfallback.jl:445, MethodInstance for MathOptInterface.get(::MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}, ::MathOptInterface.ListOfConstraintIndices{MathOptInterface.VectorOfVariables})

despite the fact that this method doesn't exist and won't get called at runtime. We also can't annotate the type, and adding mixtures of precompile directives to MOI and HiGHS didn't seem to fix the problem. Any ideas on how to resolve?

@odow
Copy link
Member

odow commented Jan 22, 2023

I'm about to add a PR to MOI that gets it down to:

image

Nice progress!

It brings HiGHS down to

julia> @time @eval begin
           let
               model = Model(HiGHS.Optimizer)
               @variable(model, x >= 0)
               @variable(model, 0 <= y <= 3)
               @objective(model, Min, 12x + 20y)
               @constraint(model, c1, 6x + 8y >= 100)
               @constraint(model, c2, 7x + 12y >= 120)
               optimize!(model)
           end
       end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(220) 0s
          2     2.0500000000e+02 Pr: 0(0) 0s
Model   status      : Optimal
Simplex   iterations: 2
Objective value     :  2.0500000000e+02
HiGHS run time      :          0.00
  0.689926 seconds (465.37 k allocations: 31.314 MiB, 98.62% compilation time)

(Considering we started at ~6seconds before, and even more before you made the change to JuMP.)

@odow
Copy link
Member

odow commented Jan 22, 2023

With jump-dev/JuMP.jl#3195, the graph is now

image

@odow
Copy link
Member

odow commented Jan 23, 2023

The _rewrite bits would be greatly improved by jump-dev/JuMP.jl#3125:

image

and it gets us to less than 0.5s:

julia> @time @eval begin
           let
               model = Model(HiGHS.Optimizer)
               @variable(model, x >= 0)
               @variable(model, 0 <= y <= 3)
               @objective(model, Min, 12x + 20y)
               @constraint(model, c1, 6x + 8y >= 100)
               @constraint(model, c2, 7x + 12y >= 120)
               optimize!(model)
           end
       end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(220) 0s
          2     2.0500000000e+02 Pr: 0(0) 0s
Model   status      : Optimal
Simplex   iterations: 2
Objective value     :  2.0500000000e+02
HiGHS run time      :          0.00
  0.483362 seconds (309.34 k allocations: 20.749 MiB, 98.07% compilation time)

So I'm close to calling this a win.

The left red tower is the _moi_add_variable which makes sense. That's the JuMP->HiGHS transition point that we can't get without a weakdep. The right tower is the _try_add_constrained_variables method which is pretty horrible. There are probably some more small improvements that could be made.

@odow
Copy link
Member

odow commented Jan 23, 2023

Latest change to HiGHS drops to

julia> @time @eval begin
           let
               model = Model(HiGHS.Optimizer)
               @variable(model, x >= 0)
               @variable(model, 0 <= y <= 3)
               @objective(model, Min, 12x + 20y)
               @constraint(model, c1, 6x + 8y >= 100)
               @constraint(model, c2, 7x + 12y >= 120)
               optimize!(model)
           end
       end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(220) 0s
          2     2.0500000000e+02 Pr: 0(0) 0s
Model   status      : Optimal
Simplex   iterations: 2
Objective value     :  2.0500000000e+02
HiGHS run time      :          0.00
  0.397762 seconds (259.76 k allocations: 17.405 MiB, 97.57% compilation time)

image

@odow
Copy link
Member

odow commented Jan 23, 2023

This is a very nice win. Thanks for all of the work you've put in. This should make a massive improvement to the ecosystem as it gets rolled out.

On currently released versions:

julia> @time using JuMP, HiGHS
  7.565047 seconds (8.91 M allocations: 581.336 MiB, 3.27% gc time, 0.20% compilation time)

julia> @time @eval begin
           let
               model = Model(HiGHS.Optimizer)
               @variable(model, x >= 0)
               @variable(model, 0 <= y <= 3)
               @objective(model, Min, 12x + 20y)
               @constraint(model, c1, 6x + 8y >= 100)
               @constraint(model, c2, 7x + 12y >= 120)
               optimize!(model)
           end
       end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(220) 0s
          2     2.0500000000e+02 Pr: 0(0) 0s
Model   status      : Optimal
Simplex   iterations: 2
Objective value     :  2.0500000000e+02
HiGHS run time      :          0.00
  9.958522 seconds (11.84 M allocations: 1.506 GiB, 5.49% gc time, 99.82% compilation time)

With master of JuMP, MOI, and HiGHS

julia> @time using JuMP, HiGHS
  8.848867 seconds (10.13 M allocations: 655.850 MiB, 3.44% gc time, 0.20% compilation time)

julia> @time @eval begin
           let
               model = Model(HiGHS.Optimizer)
               @variable(model, x >= 0)
               @variable(model, 0 <= y <= 3)
               @objective(model, Min, 12x + 20y)
               @constraint(model, c1, 6x + 8y >= 100)
               @constraint(model, c2, 7x + 12y >= 120)
               optimize!(model)
           end
       end;
Running HiGHS 1.4.0 [date: 1970-01-01, git hash: bcf6c0b22]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Presolving model
2 rows, 2 cols, 4 nonzeros
2 rows, 2 cols, 4 nonzeros
Presolve : Reductions: rows 2(-0); columns 2(-0); elements 4(-0) - Not reduced
Problem not reduced by presolve: solving the LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(220) 0s
          2     2.0500000000e+02 Pr: 0(0) 0s
Model   status      : Optimal
Simplex   iterations: 2
Objective value     :  2.0500000000e+02
HiGHS run time      :          0.00
  0.412142 seconds (259.70 k allocations: 17.417 MiB, 97.80% compilation time)

@mlubin
Copy link
Member

mlubin commented Jan 23, 2023

Awesome!

@chriscoey
Copy link
Contributor

incredible!

@timholy
Copy link
Contributor Author

timholy commented Jan 23, 2023

Great work @odow! I am very grateful that you've taken the time to master these issues and combine it with your expertise with the JuMP ecosystem.

How should I setup precompile statements for solvers which set a global constant in init?

If there's something not reproducible (e.g., that global constant involves a pointer), then as I'm sure you know, you'd need to reset it when __init__ runs after module-loading. You also wouldn't want any "state" hanging around that depends on the non-reproducible element; that should be cleared before the final end of the main module in your package.

You probably know this already, but the right mental picture is that your package's source code (.jl files) acts as a collective build script. During precompilation, you execute the files to build the package (which defines the modules and any method extensions), and then julia creates a snapshot of the resulting "diff" to the running system. This "diff" gets stored to disk, and when you say using MyPackage it gets reloaded and patched into the running system. Other than __init__, nothing in the package source code runs when you load the package.

@odow
Copy link
Member

odow commented Jan 23, 2023

I couldn't make the last red bar go away, even with explicit calls.

Is the problem that there are still some invalidations?

The gist of the problem is something like:

abstract type A end
struct B <: A end
foo(::Type{B}) = B()
struct C <: A end
foo(::Type{C}) = C()
bar() = Type[B, C]
baz() = Any[foo(T) for T in bar()]

In the loop, T is inferred as Type{<:A}, and so the method foo(::Type{T}) where {T<:A} shows up as one of the bars, even though that method doesn't exist.

Are there any techniques for dealing with these kinds of problems?

If you run this script the #master of JuMP, MathOptInterface, and HiGHS you'll see the problem:

using JuMP, HiGHS
using SnoopCompileCore
tinf = @snoopi_deep @eval begin
  model = Model(HiGHS.Optimizer)
  @variable(model, x >= 0)
  @variable(model, 0 <= y <= 3)
  @objective(model, Min, 12x + 20y)
  @constraint(model, c1, 6x + 8y >= 100)
  @constraint(model, c2, 7x + 12y >= 120)
  optimize!(model)
end;
using SnoopCompile
using ProfileView
ProfileView.view(flamegraph(tinf))

@timholy
Copy link
Contributor Author

timholy commented Jan 23, 2023

Is this the MathOptInterface.Utilities._try_constrain_variables_on_creation method with S<:MathOptInterface.AbstractVectorSet? (Fifth bar up the last big stack.) It's not cached. Have you tried plain old precompile(f, (types...,))? Might work, not sure.

Recompilation of an abstract call does not seem terribly uncommon, and I don't yet know what to do about it. A couple of issues are:

If you want to be able to check whether a MethodInstance appears in the cachefile, you can use https://github.com/timholy/PkgCacheInspector.jl and do something like this:

using PkgCacheInspector, MethodAnalysis
cf = info_cachefile("MathOptInterface")
mis = methodinstances(cf);
using MathOptInterface
using JuMP, HiGHS
...                                   # this is your workload above

and then click on a ProfileView bar, followed by

mi = ProfileView.clicked[].linfo     # get the MethodInstance corresponding to the most recently-clicked bar
mi  mis

@odow
Copy link
Member

odow commented Jan 23, 2023

So using SnoopCompile.parcel and SnoopCompile.write, I get this precompile block for MathOptInterface:

function _precompile_()
    ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
    Base.precompile(Tuple{typeof(setindex!),IndexMap,MathOptInterface.ConstraintIndex{MathOptInterface.VectorOfVariables, S},MathOptInterface.ConstraintIndex{MathOptInterface.VectorOfVariables, S}})   # time: 0.014820462
end

That corresponds to the top-right green bar in the flame graph. However, it's invalid code because S isn't defined. Should SnoopCompile emit invalid precompile statements? Or is it a bug?

@timholy
Copy link
Contributor Author

timholy commented Jan 23, 2023

Bug

@odow
Copy link
Member

odow commented Jan 25, 2023

I thought it'd be useful to put some concrete numbers on where things stand. Small increase in using time, big decrease in runtime.

HiGHS Ipopt
using Before 7.2 7.6
using After 8.3 (+1.1) 9.0 (+1.4)
solve Before 10.2 13.6
solve After 0.7 (-9.5) 0.9 (-12.7)

Now under 1second for run-time, which feels a loooot snappier. We can continue to improve things. MutableArithmetics' _rewrite method is currently a bottleneck, but jump-dev/JuMP.jl#3125 gets rid of it. And once that PR is included, HiGHS runtime drops by another 0.3s or so, so HiGHS is ~0.4 seconds, and Ipopt is ~0.6.

The main downsides are that we need to maintain the precompile models in each solver, but that uses only the public API so it's not a heavy lift going forward. And much longer precompile times. The latter is particularly annoying for me developing because I need to precompile all the time, but the win is worth it for the reduction in TTFX. No more... Let me just wait for that to compile... in talks/tutorials etc.

This is also orthogonal to the PkgCompiler stuff, so if you use compile an image you can get rid of the using time as well.

Before

(release) pkg> st
Status `/private/tmp/release/Project.toml`
⌃ [87dc4568] HiGHS v1.4.1
  [b6b21f68] Ipopt v1.1.0
  [4076af6c] JuMP v1.6.0
⌃ [b8f27783] MathOptInterface v1.11.4
Info Packages marked with ⌃ have new versions available and may be upgradable.

After

(master) pkg> st
Status `/private/tmp/master/Project.toml`
  [87dc4568] HiGHS v1.4.2 `https://github.com/jump-dev/HiGHS.jl.git#master`
  [b6b21f68] Ipopt v1.1.0 `https://github.com/jump-dev/Ipopt.jl.git#master`
  [4076af6c] JuMP v1.6.0 `https://github.com/jump-dev/JuMP.jl.git#master`
  [b8f27783] MathOptInterface v1.11.5 `https://github.com/jump-dev/MathOptInterface.jl.git#master`

Code

@time using JuMP, HiGHS
@time @eval begin
  let
    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x >= 0)
    @variable(model, 0 <= y <= 3)
    @objective(model, Min, 12x + 20y)
    @constraint(model, c1, 6x + 8y >= 1)
    @constraint(model, c2, 7x + 12y <= 120)
    optimize!(model)
  end
end;
@time using JuMP, Ipopt
@time @eval begin
  let
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, x >= 0)
    @variable(model, 0 <= y <= 3)
    @NLobjective(model, Min, (12x + 20y)^2)
    @constraint(model, c1, 6x + 8y >= 1)
    @constraint(model, c2, (7x + 12y)^2 <= 120)
    optimize!(model)
  end
end;

@timholy
Copy link
Contributor Author

timholy commented Jan 25, 2023

Awesome work!

And much longer precompile times. The latter is particularly annoying for me developing because I need to precompile all the time

This is a genuine issue. SnoopPrecompile allows you to specify, locally, that you want to skip the workload for certain packages: see the final few lines of https://timholy.github.io/SnoopCompile.jl/stable/snoop_pc/. The increased precompilation time on 1.9 is mostly due to the fact that we need to do LLVM codegen twice (for complicated reasons), but if you don't run the workload then there probably isn't very much compiled code in the package. So disabling the workload for specific packages should reduce compile times to at or perhaps below 1.8-levels prior to the addition of this feature to SnoopPrecompile.

but the win is worth it for the reduction in TTFX. No more... Let me just wait for that to compile... in talks/tutorials etc.

Glad that you envision these kinds of benefits! If you use the SnoopPrecompile/Preferences trick above, just make sure to change it back and launch Pkg.precompile well before you give that talk! It will likely re-compile almost everything (sad) because so many packages now directly or indirectly depend on SnoopPrecompile, but we don't yet see a way around that while also keeping everything in a consistent state.

@vchuravy
Copy link

It will likely re-compile almost everything (sad) because so many packages now directly or indirectly depend on SnoopPrecompile, but we don't yet see a way around that while also keeping everything in a consistent state.

We could make the key per package instead of having it be in SnoopPrecompile? But also makes it more annoying to set.

@timholy
Copy link
Contributor Author

timholy commented Jan 26, 2023

Yeah, I'm not exactly sure how that would work. You'd need a name per package, right? I guess we could automate the name selection based on @__MODULE__? But then users would have to understand the name-mangling.

@vchuravy
Copy link

Either @precompile_setup or @precompile_all_calls would insert a:

parse(Bool, SnoopPrecompile.Preference.@load_preference("snoop_precompile", "true"))

Which would then allow the user to set a preference flag per Package.

@timholy
Copy link
Contributor Author

timholy commented Jan 26, 2023

Maybe I'm misunderstanding your proposal, but is that different from

https://github.com/timholy/SnoopCompile.jl/blob/01552bc8ac3ccffb3699b983fbba960fcaa22e91/SnoopPrecompile/src/SnoopPrecompile.jl#L9-L13

and

https://github.com/timholy/SnoopCompile.jl/blob/01552bc8ac3ccffb3699b983fbba960fcaa22e91/SnoopPrecompile/src/SnoopPrecompile.jl#L57

? The issue is that PkgA depends on SnoopPrecompile, but SnoopPrecompile depends on the compile-time preference skip_precompile. Making it a compile-time preference is, I think, the only decent way of ensuring that the settings are applied consistently. But sadly I think that forces you to recompile every package that depends on SnoopPrecompile, rather than just the ones that were affected by the settings change.

Or are you saying that each package would have a LocalPreferences.toml that sets snoop_precompile just for that package? Sadly, I don't think that works very well; whether preferences are applied is very what-directory-am-I-in-dependent, and AFAICT the only semi-reliable place to put preferences is in your default environment.

@vchuravy
Copy link

Or are you saying that each package would have a LocalPreferences.toml that sets snoop_precompile just for that package?

Yes. IIUC the issue is that changing the preference causes the entirety of the pkg stack to be recached. Whereas if you hoist the preference to each individual package you could set it only for the current dev-ed packages.

@timholy
Copy link
Contributor Author

timholy commented Jan 27, 2023

For two reasons:

  1. Preference-loading depends on the path you launch Julia from. See below.
  2. For non-devved packages, if you stored the flag in the package's own LocalPreferences.jl you'd have to re-set the preference every time you update to a new version.

In practice 2 might not be that serious because I suspect it will mostly be developers who want to disable this. But problem 1 remains. Here's a demo:

module TestPrefs

using Preferences

const should_precompile = @load_preference("should_precompile", true)

end
tim@diva:~/.julia/dev/TestPrefs$ cat LocalPreferences.toml
[TestPrefs]
should_precompile = false
tim@diva:~/.julia/dev/TestPrefs$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0-beta3 (2023-01-18)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using TestPrefs
[ Info: Precompiling TestPrefs [51b07c64-1e5a-4f05-bdfd-3a9e5596e73a]

julia> TestPrefs.should_precompile
false

julia>
tim@diva:~/.julia/dev/TestPrefs$ cd ~
tim@diva:~$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0-beta3 (2023-01-18)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.9) pkg> dev TestPrefs
   Resolving package versions...
    Updating `~/.julia/environments/v1.9/Project.toml`
  [51b07c64] + TestPrefs v0.1.0 `~/.julia/dev/TestPrefs`
    Updating `~/.julia/environments/v1.9/Manifest.toml`
  [51b07c64] + TestPrefs v0.1.0 `~/.julia/dev/TestPrefs`

julia> using TestPrefs

julia> TestPrefs.should_precompile
true

AFAICT the fact that loading of preferences is path-specific is a deliberate design decision, not a bug.

@vchuravy
Copy link

Preference-loading depends on the path you launch Julia from. See below.

It depends on your project/load path, not the path you launch Julia from.

For non-devved packages, if you stored the flag in the package's own LocalPreferences.jl you'd have to re-set the preference every time you update to a new version.

You cannot use the LocalPreferences.jl for a non-deved packages since that directory is supposed to be immutable and might lay on a read-only directory.

You can use a global entry in the load path to set preferences across sessions. The same issue applies to the current solution though.

@timholy
Copy link
Contributor Author

timholy commented Jan 27, 2023

It depends on your project/load path, not the path you launch Julia from.

True.

I guess we can solve this if we have SnoopPrecompile handle all the mangling:

using Foo, SnoopPrecompile
full_precompile(Foo, false)

could write a variable __SNOOPPC__Foo = false to the current environment's LocalPreferences.toml, and then it would look for it during precompilation. I was thinking we'd manage it through Preferences (that would keep you from needing to load the package at all).

@pulsipher
Copy link

@odow I see you have implemented this for Ipopt.jl and HiGHS.jl. Is Gurobi.jl still on the todo list?

@odow
Copy link
Member

odow commented Mar 24, 2023

I guess we'll roll this out for all the solvers. I've been holding off until 1.9 is actually released

@pulsipher
Copy link

Fair enough, I just ask cause it might be useful for the performance paper we are putting together.

@odow
Copy link
Member

odow commented Mar 24, 2023

If you're running benchmarks, I'd encourage you to use a sys image like we did here:

It's also fair to report the with- and without sys image results.

Even on 1.9, there's still a little bit of latency with the new compilation.

@pulsipher
Copy link

Right, I was hoping to make the without sys image results more compelling. But it can wait for 1.9 (we can add it with the next round of revisions).

@odow
Copy link
Member

odow commented Mar 24, 2023

Yeah I think it's fair to use Julia v1.6 with an without the sys image, and then to say that the upcoming release of Julia v1.9 will remove most of the difference in performance, but that you don't include results because it wasn't released at the time of writing the paper.

It's probably also fair to benchmark python with/without pypy.

Happy to read over a draft if you want feedback.

@odow
Copy link
Member

odow commented Jun 27, 2023

I'm moving this issue to MOI. I think we're pretty solid at the JuMP level, but MOI could provide some tooling to make it easier for solvers to precompile themselves.

@odow odow transferred this issue from jump-dev/JuMP.jl Jun 27, 2023
@odow odow added this to the v1.x milestone Oct 24, 2023
@odow
Copy link
Member

odow commented Apr 11, 2024

Closing because it isn't hard for packages to add a precompile block. Here are two examples:

It's a bit hard for MOI to offer one that is tailored to each solver.

Recent Julia versions have made such great progress (thanks all), this this is much less of a problem now:

(base) oscar@Oscars-MBP /tmp % julia --project=/tmp/highs
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.2 (2024-03-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> @time using JuMP, HiGHS
  2.016348 seconds (1.17 M allocations: 82.860 MiB, 4.05% gc time, 0.95% compilation time)

julia> @time @eval begin
               let
                   model = Model(HiGHS.Optimizer)
                   set_silent(model)
                   @variable(model, x >= 0)
                   @variable(model, 0 <= y <= 3)
                   @objective(model, Min, 12x + 20y)
                   @constraint(model, c1, 6x + 8y >= 100)
                   @constraint(model, c2, 7x + 12y >= 120)
                   optimize!(model)
               end
           end;
  0.533333 seconds (341.22 k allocations: 22.873 MiB, 97.94% compilation time: 15% of which was recompilation)

We still can't get the last bit of the JuMP-MOI-HiGHS connection precompiled, but it's not a big deal.

@odow odow closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants