-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
match
an std::cmp::Ordering
generates less optimized code in nightly
#86511
Comments
thanks @yume-chan for debugging paging @Aaron1011 @michaelwoerister as perhaps can share insights about #85702 - is the PR relevant to this issue and the mentioned ones (#86391, #86354)? Do we have a common root cause? Perhaps this one is a duplicate? thanks :-) |
(I find it hard to believe that #85702 is in any way truly related to the problem described here.) Update: What I can believe is that the bisection from the description is only at the granularity of nightlies, not individual PR's. @yume-chan, is that correct? (I.e., is it true that the bisection you performed only went to the level of nightlies, not the finer grain of individual PR's?) ((cargo-bisect-rustc can help with traversing the finer-grain space of individual PR's, FYI.)) |
@pnkfelix Yes, I didn't bisect exact version. My versions are taken from Compiler Explorer and Rust Playgournd. Thanks for pointing to searched nightlies: from nightly-2021-05-17 to nightly-2021-05-18 bisected with cargo-bisect-rustc v0.6.0Host triple: x86_64-pc-windows-msvc cargo bisect-rustc --test-dir=rust-lib --start=2021-05-17 --end=2021-05-18 --preserve |
Okay, so the subsequent bisection claims that PR #84993 is the injection point. |
cc @eddyb @nagisa -- was this an expected possible result from #84993? It sounds like it's plausible that LLVM depends on block order in order to generate optimal code, though a little unfortunate in this particular case. This is likely to hit stable in 1.54, given that it's next week... but I doubt we can do anything in time for that. |
It's possible to dig a bit more into this by diffing for (bb, _) in traversal::reverse_postorder(&mir) {
fx.llbb(bb);
} before this loop: rust/compiler/rustc_codegen_ssa/src/mir/mod.rs Lines 255 to 259 in 18840b0
This should be correct (even if a bit slower without reusing the result of |
This will slip into 1.54 stable, as the release is going to be built today. |
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. @rustbot label -I-prioritize +P-high |
If anyone is interested in looking into this further, the steps I would recommend would be:
Feel free to ping me on zulip if you are interested in looking into this, especially if you need help with the steps outlined above. |
I spent some time investigating this today. |
Could it be a LLVM 13 change, to make both orders worse? |
I'm not sure - I don't really have the knowledge which is necessary to investigate this too much further (it's my first issue and I don't really know much about LLVM). Could I check if it's an LLVM issue by taking the MIR output from the fixed version of What I could also try if that doesn't work is to find the specific change on |
The fix stops working at commit |
cc @nikic |
Could someone please provide a godbolt demonstrating the regression? In https://rust.godbolt.org/z/KnK7xPMoo codegen is pretty much the same between 1.53 and nightly. (The "match" version is still badly optimized -- but that's not new?) |
This should do it. Interesting that there's no problem on your very similar version |
Yes, for that one, the optimizer successfully normalizes the code into the same form, or thereabouts, emitting the same assembly on rustc 1.53. And it looks like the difference is that by about |
Visited for P-high review It seems like there may be opportunities here for a broad set of optimizations, rather than focusing solely on the seeming regression. I recommend two actions:
having said that, even though there are useful actions, and potential owners for those actions, there doesn't seem to be much reason to prioritize this particular performance footgun over the other codegen issues we have. So, downgrading this to P-medium as well. @rustbot label: -P-high +P-medium |
I'm interested in this comment:
rust/library/core/src/slice/mod.rs
Lines 2204 to 2217 in 3824017
So I did some testing:
The following code snippet is the "bad" one (
match
anstd::cmp::Ordering
) in above comment, but withf
manually inlined.In version 1.53.0-beta.12 (2021-06-12 e7a67cc), release mode, it generates the following x86 assembly (related parts only):
(Which is identical to manually expanding to
if..else if..else
, however it's not the point of this issue)In version 1.55.0-nightly (2021-06-20 e82b650), release mode, the generated assembly code is:
It's same as the "bad" result in original comment, and obviously much unoptimized.
I think
match
anOrdering
should be a quite common use case, so shouldn't be deoptimized.Version it worked on
Rust Playground: 1.53.0
Rust Playground: 1.53.0-beta.12 (2021-06-12 e7a67cc)
Version with regression
a55748f
The text was updated successfully, but these errors were encountered: