-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM/MIR optimizations to simplify the new RFC3058 ?
desugaring
#85133
Comments
I am interested in taking this one as a first contribution. I will probably have time to work on it this week. |
For interested people, conversation on Zulip moved to https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/Optimizing.20a.20match.20on.20a.20just-constructed.20enum.20.2385133 |
I see this issue marked as open. Is it being worked on or available for me to pickup? |
It is being worked on, see #85646. Any suggestion/help is welcome, though! |
…jgillot MIR opt: separate constant predecessors of a switch For each block S ending with a switch, this pass copies S for each of S's predecessors that seem to assign the value being switched over as a const. This is done using a somewhat simple heuristic to determine what seems to be a const transitively. More precisely, this is what the pass does: - find a block that ends in a switch - track if there is an unique place set before the current basic block that determines the result of the switch (this is the part that resolves switching over discriminants) - if there is, iterate over the parents that have a reasonable terminator and find if the found determining place is likely to be (transitively) set from a const within that parent block - if so, add the corresponding edge to a vector of edges to duplicate - once this is done, iterate over the found edges: copy the target block and replace the reference to the target block in the origin block with the new block This pass is not optimal and could probably duplicate in more cases, but the intention was mostly to address cases like in rust-lang#85133 or rust-lang#85365, to avoid creating new enums that get destroyed immediately afterwards (notably making the new try v2 `?` desugar zero-cost). A benefit of this pass working the way it does is that it is easy to ensure its correctness: the worst that can happen is for it to needlessly copy a basic block, which is likely to be destroyed by cleanup passes afterwards. The complex parts where aliasing matters are only heuristics and the hard work is left to further passes like ConstProp. # LLVM blocker Unfortunately, I believe it would be unwise to enable this optimization by default for now. Indeed, currently switch lowering passes like SimplifyCFG in LLVM lose the information on the set of possible variant values, which means it tends to actually generate worse code with this optimization enabled. A fix would have to be done in LLVM itself. This is something I also want to look into. I have opened [a bug report at the LLVM bug tracker](https://bugs.llvm.org/show_bug.cgi?id=50455). When this is done, I hope we can enable this pass by default. It should be fairly fast and I think it is beneficial in many cases. Notably, it should be a sound alternative to simplify-arm-identity. By the way, ConstProp only seems to pick up the optimization in functions that are not generic. This is however most likely an issue in ConstProp that I will look into afterwards. This is my first contribution to rustc, and I would like to thank everyone on the Zulip mir-opt chat for the help and support, and especially `@scottmcm` for the guidance.
Is this issue now resolved after @Moxinilian 's work? |
Right now it is not. Right now this is blocked by LLVM doing a sub-optimal transformation that loses the optimization at a very early stage, preventing the duplicated tails to be merged back into one. I've tried investigating if we could not merge them back directly in MIR. While possible in theory, my understanding was that MIR still abstracts away too much details (notably on the discriminant aspect) for the algorithm to be reasonably complex. So right now we either need a breakthrough on that front or work in the LLVM codebase. I want to start investigating more in the LLVM side of things, but I have not successfully allocated enough time for it as of now. |
More on simplifying in MIR: What LLVM is struggling with is keeping the fact that variant switches are exhaustive, so it can merge the blocks that do the same thing except for different constants (representing the variant IDs). What we could do is do that merging directly in MIR, in a pass close to the end of the pipeline. A naive approach would be to take a switch-terminated block
Then, replace any instance of the constant
Note that if MIR was in SSA form, this would probably be more effective in practice because |
More on fixing LLVM: Right now, rustc calls into LLVM with a pass pipeline that starts with the define i64 @demo(i64 %x) {
entry:
switch i64 %x, label %bb3 [
i64 0, label %bb1
i64 1, label %bb2
]
bb1:
ret i64 0
bb2:
%0 = icmp eq i64 %x, 100 ; this will necessarily be false because %x == 1
br i1 %0, label %bb4, label %bb5
bb3:
unreachable
bb4:
ret i64 200
bb5:
ret i64 10
} is turned into define i64 @demo(i64 %x) {
entry:
%switch = icmp ult i64 %x, 1
%0 = icmp eq i64 %x, 100
%. = select i1 %0, i64 200, i64 10
%merge = select i1 %switch, i64 0, i64 %.
ret i64 %merge
} which throws away the exhaustive nature of the switch, making in this example further passes not able to optimize the useless comparison away. The code of |
Oh, that's annoying. But very intriguing. Thanks for the explanation! |
…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in #53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes #53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854
FWIW this has been fixed in llvm trunk in llvm/llvm-project@371fcb7. |
…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854
So ... what is the actual Rust code we're supposed to be optimizing here? Is the goal to get rid of the icmp + zext in https://rust.godbolt.org/z/nd9fEx3ed or is it something else? (The referenced LLVM change does not get rid of those.) |
@nikic That's right. As I understand it, the (My uninformed thought: It's odd to me that there's no |
Actually, here's a better cut-down version of the example. This is (essentially) the first part of use std::ops::ControlFlow;
pub fn demo(x: Result<u32, i32>) -> ControlFlow<i32, u32> {
match x {
Ok(v) => ControlFlow::Continue(v),
Err(e) => ControlFlow::Break(e),
}
}
It looks like the core of what we emit there is %8 = load i32, i32* %7, align 4, !dbg !10, !range !11
%_2 = zext i32 %8 to i64, !dbg !10
switch i64 %_2, label %bb2 [
i64 0, label %bb3
i64 1, label %bb1
], !dbg !12 where that range metadata is !11 = !{i32 0, i32 2} So AFAIK that PR above was about trying to avoid turning the |
Okay, so I think the PhaseOrdering change is necessary to avoid folding the switch before SROA (of course, the alternative solution is to make SROA emit range assumptions). But it's not sufficient:
We'd need either InstCombine or SimplifyCFG to fold I believe SimplifyCFG would do that as part of ForwardSwitchConditionToPHI, but this is only done at the very end of the pipeline. |
Just remembered that we have https://github.com/llvm/llvm-project/blob/fdfe26ddbeb1a5eb6106a6a27d6f8240deca543f/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp#L1256 in InstCombine. I guess it would be natural to extend to that to switches and not just conditional branches. |
…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854 (cherry picked from commit 371fcb7)
…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854
Upstream fix is llvm/llvm-project@4010a7a. |
The LLVM side of this is fixed now, and #100693 added a codegen test. Is a MIR transform still needed? |
?
desugaring?
desugaring
I'll close this one because it does look like the match-of-match, as Unfortunately, however, when just matching the result directly and rebuilding, it seems to only work for some widths. The codegen test works with |
…nge comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in llvm/llvm-project#53853 and rust-lang/rust#85133 (comment) This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes llvm/llvm-project#53853 Refs. rust-lang/rust#85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854
With #84767 implementing the new
?
desugaring, there's no mir-opt that can simplify the double-match it emits, as the arm identity pass only does Result->Result, and thus cannot handle Result->ControlFlow->Result.That's not a blocker for checkin, as the arm identity pass is disabled by default currently because of things like #78628 and it depends on the MIR inliner being enabled anyway, but it sure would be nice to do better here.
Zulip conversation with some more context: https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/Suggestions.20for.20a.20beginner/near/237739360
The text was updated successfully, but these errors were encountered: