-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_codegen_ssa: only create backend BasicBlock
s as-needed.
#84993
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 1557955334c9ba1145b41cbfe0bec0f9dbfa9e97 with merge 2c3602faa8930c104e00dc0b9d0e2ce5246f1caf... |
☀️ Try build successful - checks-actions |
Queued 2c3602faa8930c104e00dc0b9d0e2ce5246f1caf with parent 109248a, future comparison URL. |
Finished benchmarking try commit (2c3602faa8930c104e00dc0b9d0e2ce5246f1caf): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
I'm not sure I understand why there's so much... noise(?). One thing that worries me is e.g. Is it possible changing the block order causes pseudorandom effects in LLVM quality, not just performance, i.e. causing parts of Maybe it would be useful to be able to do a "stage1 perf run" (to see if it doesn't vary as much as stage2 does) but it's harder because of proc macros (you'd need to use the right beta to compile them). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable enough on the surface, and the timings appear to be neutral.
I wondered if coverage depends on all of the unreachable blocks ending up in the IR (as otherwise uncovered regions wouldn't show up), but if the coverage tests pass, this seems LGTM.
r=me in that case.
// CHECK: [[OTHERWISE]]: | ||
// CHECK-NEXT: unreachable | ||
// CHECK: [[A]]: | ||
// CHECK-NEXT: store i8 0, i8* %1, align 1 | ||
// CHECK-NEXT: br label %[[EXIT:[a-zA-Z0-9_]+]] | ||
// CHECK: [[B]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggests to me these checks want to be CHECK-DAG
but sounds like it'd be a major PiTA to adjust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a CHECK-DAG
feature? I might look into it, it's just these two tests.
// more backend-agnostic prefix such as `cg` (i.e. this would be `cgbb`). | ||
pub fn llbb(&mut self, bb: mir::BasicBlock) -> Bx::BasicBlock { | ||
self.cached_llbbs[bb].unwrap_or_else(|| { | ||
// FIXME(eddyb) only name the block if `fewer_names` is `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not make sense to just check for this within append_block
function? I guess if we did that, we'd end up in a situation where we still format!
potentially many strings only for them to be ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I think I want to solve it similar to how I did for SSA values and set_name
on them, I just don't want to do it now.
FWIW I strongly doubt you can get deterministic timings when the code received by LLVM changes in more significant ways but otherwise don't really have much impact on how heavy it is, such as is this PR. |
We were deleting unreachable blocks, so if coverage depended on them, it wouldn't have worked. I believe this PR shouldn't change which blocks exist after codege, only what order they're in. |
Oh just saw the "r=me", thanks! @bors r=nagisa |
📌 Commit 1557955334c9ba1145b41cbfe0bec0f9dbfa9e97 has been approved by |
⌛ Testing commit 1557955334c9ba1145b41cbfe0bec0f9dbfa9e97 with merge a8c109b2e4bb2469d262699d2be9202c3e2ee245... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
I was worried about this - I'll either have to use |
1557955
to
d010d70
Compare
d010d70
to
edf90cd
Compare
Not expecting anything interesting, maybe more random noise, but might as well: |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit edf90cdeffa49f5787df82790a63b8f687e991a2 with merge efa2875e004f7119f732474853a41e8d4eff22b6... |
☀️ Try build successful - checks-actions |
Queued efa2875e004f7119f732474853a41e8d4eff22b6 with parent c6dd87a, future comparison URL. |
Finished benchmarking try commit (efa2875e004f7119f732474853a41e8d4eff22b6): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Doesn't seem like a lot of change from #84993 (comment) - which makes sense, we're still only testing block order with GNU-style EH. |
…nagisa rustc_codegen_ssa: generate MSVC cleanup pads on demand, like GNU landing pads. This unblocks rust-lang#84993 in terms of codegen tests, as it brings the MSVC-style (`cleanup_pad`) EH (LLVM) block order in line with the GNU-style (`landing_pad`) EH (LLVM) block order, by having both of them be on-demand (instead of MSVC-style being eager and GNU-style lazy/on-demand). It also unifies the two implementations a bit, similar to rust-lang#84699, but in the opposite direction (as that attempt made both kinds of EH pads eagerly built). ~~Opening as draft because I haven't done enough Windows testing just yet, of both this PR, and of rust-lang#84993 rebased on it.~~ (**EDIT**: seems to be working as expected) r? `@nagisa`
edf90cd
to
0fcaf11
Compare
📌 Commit 0fcaf11 has been approved by |
☀️ Test successful - checks-actions |
@klensy Yeah but the trick is that this PR took 2h41m to build, and started one hour (AFAIK) before the nightly was published, so it couldn't have gotten in. Either way, it worked:
That contains #85316 but not this PR. |
Instead of creating one backend (e.g. LLVM) block per MIR block ahead of time, and then deleting the ones that weren't visited, this PR moves to creating the blocks as they're needed (either reached via the RPO visit, or used as the target of a branch from a different block).
As deleting a block was the only
unsafe
builder method (generally we only create backend objects, not remove them), that's gone now and codegen is overall a bit safer.The only change in output is the order of LLVM blocks (which AFAIK has no semantic meaning, other than the first block being the entry block). This happens because the blocks are now created due to control-flow edges, rather than MIR block order.
I'm making this a standalone PR because I keep getting wild perf results when I change anything in codegen, but if you want to read more about my plans in this area, see #84771 (comment) (and #84771 (comment) - but that may be a bit outdated).
(You may notice some of the APIs in this PR, like
append_block
, don't help with the future plans - but I didn't want to include the necessary refactors that pass a build around everywhere, in this PR, so it's a small compromise)r? @nagisa @bjorn3