Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2x regression in indexing benchmarks due to 'Remove references to non opaque pointers in codegen and LLVM passes (#54853)' #55090

Open
Zentrik opened this issue Jul 9, 2024 · 3 comments · Fixed by #55107
Assignees
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@Zentrik
Copy link
Member

Zentrik commented Jul 9, 2024

Through bisection I identified 5e1bcdf as causing the regressions below. This change also led to some improvements but caused many more regressions. I believe these two commits were my more minimal version of the identified commit, so would be good to test if they also cause the regression, 4eef1be 756e72f.

A subset of the results is below, for full results see https://tealquaternion.camdvr.org/compare.html?start=a14cc38512b6daab6b8417ebb8a64fc794ff89cc&end=323e725c1e4848414b5642b8f54c24916b9ddd9e&stat=min-wall-time or https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_date/2024-07/05/report.md.

Summary

Range Mean Count
Regressions 0.52%, 207.23% 27.72% 129
Improvements -47.24%, -0.17% -14.31% 22
All -47.24%, 207.23% 21.60% 151

Benchmarks

Benchmark % Change Significance Factor
array.index.(sumrange_view, SubArray{Int32, 2, Base.ReshapedArray{Int32, 2, SubArray{Int32, 3, Array{Int32, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}, Tuple{}}, Tuple{Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}) 207.23% 54.86x
array.index.(sumcolon_view, SubArray{Int32, 2, Base.ReshapedArray{Int32, 2, SubArray{Int32, 3, Array{Int32, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}, Tuple{}}, Tuple{Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}) 205.96% 54.63x
array.index.(sumeach_view, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}) 159.39% 203.84x
array.index.(sumlinear, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}) 159.14% 244.42x
array.index.(sumeach, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}) 159.02% 451.23x
@Zentrik Zentrik added performance Must go faster regression Regression in behavior compared to a previous version labels Jul 9, 2024
@Zentrik
Copy link
Member Author

Zentrik commented Jul 10, 2024

The regression is not present on master with 5e1bcdf reverted and 4eef1be, 756e72f cherry picked (here is the branch https://github.com/Zentrik/julia/tree/test-54853). Those two commits should be sufficient for llvm 18.

Looking at BaseBenchmarks.SUITE[["array", "index", ("sumelt", "SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}")]], the benchmark is

function perf_sumelt(A)
    s = zero(eltype(A))
    for a in A
        s += a
    end
    return s
end

C = rand(Int32, 4, 500, 500)
A = view(C, 1, :, :)
@benchmark perf_sumelt($A)

The unoptimized llvm ir is identical apart from 4 instructions that are %90 = getelementptr inbounds i32, ptr %..., i64 %...on the fast version and %90 = getelementptr i8, ptr %..., i64 %... on the regression. Here's the optimized IR for the fast version https://gist.github.com/Zentrik/7a2834c7672963bb8bf2b00201f3c9fe and the slow version https://gist.github.com/Zentrik/1ff09cce2f2a098b6622286fdaa678f4. The slow version has a lot of extra instructions which I think is just a aggressive vectorization but not sure.

vtjnash pushed a commit that referenced this issue Aug 8, 2024
The Julia memory model is always inbounds for GEP.

This makes the code in #55090
look almost the same as it did before the change. Locally I wasn't able
to reproduce the regression, but given it's vectorized code I suspect it
is backend sensitive.

Fixes #55090

Co-authored-by: Zentrik <[email protected]>
@topolarity
Copy link
Member

Sounds like this is still an issue despite #55107:

The regression still exists on nanosoldier and locally and this doesn't seem to have made much difference to performance. The optimized ir is essentially unchanged compared to master apart from some inbounds sprinkled on some geps. The only difference in the unoptimized ir with the fast version is i8s are used instead of i32s.

@topolarity topolarity reopened this Aug 8, 2024
@Zentrik
Copy link
Member Author

Zentrik commented Aug 8, 2024

While the regression hasn't been fixed, there's probably not much to be done. #55412 reverted the relevant part of the commit causing the regression but with the LLVM 18 upgrade that's now a net negative.

lazarusA pushed a commit to lazarusA/julia that referenced this issue Aug 17, 2024
The Julia memory model is always inbounds for GEP.

This makes the code in JuliaLang#55090
look almost the same as it did before the change. Locally I wasn't able
to reproduce the regression, but given it's vectorized code I suspect it
is backend sensitive.

Fixes JuliaLang#55090

Co-authored-by: Zentrik <[email protected]>
KristofferC pushed a commit that referenced this issue Aug 19, 2024
The Julia memory model is always inbounds for GEP.

This makes the code in #55090
look almost the same as it did before the change. Locally I wasn't able
to reproduce the regression, but given it's vectorized code I suspect it
is backend sensitive.

Fixes #55090

Co-authored-by: Zentrik <[email protected]>
(cherry picked from commit 7e1f0be)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
3 participants