-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Expr(:ivdepscope)
to support not marking the entire loop body as ivdep
#43261
Conversation
After some further trial with:
I find that, to make self-inplace broadcast vectorlizable, we only need to tell LLVM " julia> using BenchmarkTools
julia> a = zeros(Float32, 128, 32); a_ = view(a, 1:128, 1:32);
julia> @btime $a .+= $a;
280.000 ns (0 allocations: 0 bytes) # on 1.7.0: 1.650 μs (0 allocations: 0 bytes)
julia> @btime $a .+= $a .+ $a .+ $a;
511.979 ns (0 allocations: 0 bytes) # on 1.7.0: 3.550 μs (0 allocations: 0 bytes)
julia> @btime $a_ .+= $a_;
289.286 ns (0 allocations: 0 bytes) # on 1.7.0: 1.730 μs (0 allocations: 0 bytes)
julia> @btime $a_ .+= $a_ .+ $a_ .+ $a_;
609.551 ns (0 allocations: 0 bytes) # on 1.7.0: 6.440 μs (0 allocations: 0 bytes) Some simple safety check: julia> const p = Ref(0);
julia> a = zeros(Float32, 128, 32); b = similar(a);
julia> f(x) = x + (p[] += 0); # f has no side-effect
julia> @btime $b .= f.($a);
221.526 ns (0 allocations: 0 bytes) # on 1.7.0: 1.440 μs (0 allocations: 0 bytes)
julia> @btime $a .= f.($a);
164.899 ns (0 allocations: 0 bytes) # on 1.7.0: 2.033 μs (0 allocations: 0 bytes)
julia> g(x) = x + (p[] = ~p[]); # g has side-effect
julia> @btime $b .= g.($a);
2.411 μs (0 allocations: 0 bytes) # on 1.7.0: 2.522 μs (0 allocations: 0 bytes)
julia> @btime $a .= g.($a);
2.411 μs (0 allocations: 0 bytes) # on 1.7.0: 2.511 μs (0 allocations: 0 bytes) The above example shows that this change is safer than replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of the current construct. Currently Expr(:loopinfo)
has the semantics that it should always be the last instruction in a loop. For a begin/end
construct I would rather mimic Base.Experimental.Const
and @aliasscope
base/simdloop.jl
Outdated
@@ -125,12 +126,12 @@ either case, your inner loop should have the following properties to allow vecto | |||
* No iteration ever waits on a previous iteration to make forward progress. | |||
""" | |||
macro simd(forloop) | |||
esc(compile(forloop, nothing)) | |||
esc(compile(forloop, Symbol("julia.ivdep.end"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this nothing
? Makes little sense to have non-matching "begin"/"end" construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we'd better have something like Expr(:ivdepscope)
and Expr(:popivdepscope)
instead.
(and add a @ivdep
macro ?).
Edit: Apparently this is beyond my competence. Can't understand why:
julia> @eval f(x) = $(Expr(:ivdepscope, :begin))
f (generic function with 1 method)
julia> @code_lowered f(1)
CodeInfo(
1 ─ $(Expr(:ivdepscope, :(Main.begin)))
└── return nothing
)
julia> @eval f(x) = $(Expr(:loopinfo, :begin))
f (generic function with 1 method)
julia> @code_lowered f(1)
CodeInfo(
1 ─ $(Expr(:loopinfo, :begin))
└── return nothing
)
d184d0a
to
bbf2dbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. introduce `jl_ivdepscope_sym` 2. define `jl_ivdepscope_error` to thrown error message.
make `Expr(:ivdepscope, :begin/end)` lowered to `jl_ivdepscope_func`
1. let `loopinfo_mark` erase `julia.ivdepscope` if it has `julia.simd`. 2. erase `julia.ivdepscope` in unreachable branch even there's no `loopinfo_mark`. (make error message clearer)
I'm not sure whether this is the correct way to implement scoped
Some example: julia> f(x) = @inbounds for i in eachindex(x)
Base.@ivdep x[i] += i
end
f (generic function with 1 method)
julia> f([1,2,3,4])
ERROR: Found ivdepscope outside @simd.
Stacktrace:
[1] macro expansion
@ .\simdloop.jl:151 [inlined]
[2] f(x::Vector{Int64})
@ Main .\REPL[1]:2
[3] top-level scope
@ REPL[2]:1
julia> f((1,2,3,4))
ERROR: MethodError: no method matching setindex!(::NTuple{4, Int64}, ::Int64, ::Int64)
Stacktrace:
[1] macro expansion
@ .\simdloop.jl:152 [inlined]
[2] f(x::NTuple{4, Int64})
@ Main .\REPL[1]:2
[3] top-level scope
@ REPL[3]:1 |
julia.ivdep
with julia.ivdep.begin/end
to support not marking the entire loop body as ivdep
Expr(:ivdepscope)
to support not marking the entire loop body as ivdep
I guess we won't need this after #43852. |
Currently,
@simd ivdep
assumes the entire loop is free of "no loop-carried memory dependencies", which limits its usage in our broadcast system.This PR tries to split
julia.ivdep
into 2 meta:julia.ivdep.begin
andjulia.ivdep.end
, and makes the simd-loop pass only marks the access within abegin/end
block asMD_mem_parallel_loop_access
.With this PR, if we find that all the args in a flat
bc::Broadcasted
are safe to parallelly loaded, and thedest::AbstractArray
is safe to parallelly strored.Then we can implement the
copyto!
kernal as:If
bc.f
is free of memory access, then LLVM should FIND this loop vectorlizable and add no runtime check. (and we can makea .+= 1
vectorlized more easily)If not, then let LLVM checks whether
bc.f
might have side effect.This PR only changes the pass inplementation.
And makes
@simd ivdep
generates ajulia.ivdep.begin/end
block instead of a singlejulia.ivdep
The usage and effect of
@simd
and@simd ivdep
are not changed.I'm not familiar with LLVM and I'm not sure this change is the correct way to make self-inplace broadcast vectorlizable.
All suggestions and comments are welcome.