rustc: Don't inline in CGUs at -O0 #45075

alexcrichton · 2017-10-06T23:05:07Z

This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat #[inline] functions by translating them into all codegen units that
they're needed within, marking the linkage as internal. This commit changes
the behavior so that in debug mode (compiling at -O0) rustc will instead only
translate #[inline] functions into one codegen unit, forcing all other
codegen units to reference this one copy.

The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example String would be basically
inlined into all codegen units if used). The strategy in this commit should
reduce the cost of #[inline] functions to being equivalent to one codegen
unit, which is only translating and codegen'ing inline functions once.

Collected data shows that this does indeed improve the situation from before
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.

One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.

rust-highfive · 2017-10-06T23:05:12Z

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2017-10-06T23:05:16Z

r? @michaelwoerister

alexcrichton · 2017-10-06T23:06:59Z

src/librustc_trans/partitioning.rs

@@ -280,75 +280,74 @@ fn place_root_translation_items<'a, 'tcx, I>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
    let mut internalization_candidates = FxHashSet();

    for trans_item in trans_items {
-        let is_root = trans_item.instantiation_mode(tcx) == InstantiationMode::GloballyShared;


This file's diff is probably best viewed with no whitespace (very few changes here, just indentation)

alexcrichton · 2017-10-06T23:27:30Z

Oh I should also mention that I only changed the compiler's default behavior in O0 mode. My thinking was that if we don't have any way to inline into all codegen units then we basically kill performance for optimized codegen unit builds, so I didn't want to tamper with anyone relying on that. Once we have ThinLTO, however, that I believe is the avenue by which we'd achieve inlining, so I think we could turn this behavior on by default.

michaelwoerister · 2017-10-07T14:54:33Z

@bors r+

Thanks, @alexcrichton ! I'm excited about this :) Apart from reducing plain multi-cgu overhead it could have quite the positive effect on incremental compilation with debuginfo enabled.

@alexcrichton it would be great if you could do some runtime performance benchmarks to see the effect of this.

One thing that I'm not sure about is whether it is a good idea to also apply this to drop-glue and shims. With the changes in this PR they will all end up in one big codegen unit (the "fallback codegen-unit) and that might not be what we want. Also characteristic_def_id_of_trans_item() should probably be made smarter before extending this to non--O0 scenarios. The implementation still assumes that it never has to deal with anything other than regular functions.

bors · 2017-10-07T14:54:34Z

📌 Commit cb7f7f0 has been approved by michaelwoerister

alexcrichton · 2017-10-07T15:20:53Z

@michaelwoerister when you say performance benchmarks, you mean of activating this in release mode? I sort of assumed that the benchmarks could only be worse than this which is where we don't inline generics across codegen units but we inline #[inline] functions across all cgus. I could see just how bad it fares though :)

mersinvald · 2017-10-07T15:26:03Z

@alexcrichton I've tested this PR on the same project as before:

1:  66.25 65.56 64.43 (65.41)
2:  63.62 63.66 65.21 (64.16)
4:  60.76 60.81 60.74 (60.77)
8:  60.88 60.96 61.31 (61.05)
16: 61.31 62.62 62.27 (62.06)
32: 63.65 64.10 64.37 (64.04)

Interesting that now 2 CGUs performing only as goog as 32 (with inlining it was the fastest option), and 4 CGUs seems the fastest and most stable option.

Also now builds are faster with any CGU number then with inlining. (approx. -10 secs at least).

alexcrichton · 2017-10-07T15:32:55Z

@mersinvald fascinating! I'm particularly curious about the huge dip using one codegen unit. Before you mentioned that one CGU was 80.41s to compile and now you're seeing 65.41s. That's quite a big improvement, and definitely shouldn't be affected by this PR! (this PR should have the same performance in one-CGU mode before and after).

Do you know what might be causing that discrepancy? I'm still surprised at the drastic difference between 4 and 32 cgus!

mersinvald · 2017-10-07T15:54:06Z

@alexcrichton I'll recheck now with one-commit-before-this-pr rustc, our code base changed a bit, I am sorry for confusion in advance :)

mersinvald · 2017-10-07T16:33:17Z

@alexcrichton rechecked.

Before PR:
4:  62.87 63.40 63.77 (63.34)
32: 69.74 69.47 69.55 (69.53)

After PR:
1:  66.25 65.56 64.43 (65.41)
2:  63.62 63.66 65.21 (64.16)
4:  60.76 60.81 60.74 (60.77)
8:  60.88 60.96 61.31 (61.05)
16: 61.31 62.62 62.27 (62.06)
32: 63.65 64.10 64.37 (64.04)

So, disabling inlines clearly has a good effect, dip between 4 and 32 GCUs is lower.

-2 secs with 4 CGUs anyway)

bors · 2017-10-08T00:43:21Z

☔ The latest upstream changes (presumably #44841) made this pull request unmergeable. Please resolve the merge conflicts.

This commit tweaks the behavior of inlining functions into multiple codegen units when rustc is compiling in debug mode. Today rustc will unconditionally treat `#[inline]` functions by translating them into all codegen units that they're needed within, marking the linkage as `internal`. This commit changes the behavior so that in debug mode (compiling at `-O0`) rustc will instead only translate `#[inline]` functions into *one* codegen unit, forcing all other codegen units to reference this one copy. The goal here is to improve debug compile times by reducing the amount of translation that happens on behalf of multiple codegen units. It was discovered in rust-lang#44941 that increasing the number of codegen units had the adverse side effect of increasing the overal work done by the compiler, and the suspicion here was that the compiler was inlining, translating, and codegen'ing more functions with more codegen units (for example `String` would be basically inlined into all codegen units if used). The strategy in this commit should reduce the cost of `#[inline]` functions to being equivalent to one codegen unit, which is only translating and codegen'ing inline functions once. Collected [data] shows that this does indeed improve the situation from [before] as the overall cpu-clock time increases at a much slower rate and when pinned to one core rustc does not consume significantly more wall clock time than with one codegen unit. One caveat of this commit is that the symbol names for inlined functions that are only translated once needed some slight tweaking. These inline functions could be translated into multiple crates and we need to make sure the symbols don't collideA so the crate name/disambiguator is mixed in to the symbol name hash in these situations. [data]: rust-lang#44941 (comment) [before]: rust-lang#44941 (comment)

alexcrichton · 2017-10-08T02:11:26Z

@mersinvald hm ok very interesting! Thanks so much again for taking the time to collect this data :)

Something still feels not quite right though about the 4 -> 32 codegen unit transition. Do you know if there's one crate in particular that slows down by 4 seconds? Or does everything in general just slow down a little bit?

@bors: r=michaelwoerister

bors · 2017-10-08T02:11:26Z

📌 Commit 4b2bdf7 has been approved by michaelwoerister

mersinvald · 2017-10-08T11:03:38Z

@alexcrichton if there were some cargo option to display timings for every built crate, I could measure it. Is there something like that?

michaelwoerister · 2017-10-09T07:13:05Z

@mersinvald Do you maybe have a mechanical hard drive instead of an SSD in the computer you are testing this on? Having more CGUs will lead to more I/O and we are probably testing mostly on machine with at least some kind of SSD.

michaelwoerister · 2017-10-09T10:06:00Z

@bors p=1 (this is kind of important)

alexcrichton · 2017-10-09T10:42:43Z

@mersinvald ah unfotunately there's not anything easy to see how long crates are compiling for, but you may know of some that take longer than others perhaps?

bors · 2017-10-09T11:13:52Z

⌛ Testing commit 4b2bdf7 with merge 199e837fe292873170dc1cf679ca6345a9df41cf...

bors · 2017-10-09T12:46:15Z

💔 Test failed - status-travis

michaelwoerister · 2017-10-09T12:50:23Z

@michaelwoerister when you say performance benchmarks, you mean of activating this in release mode?

No, I was actually interested in the runtime performance difference in debug mode. In particular: is there any difference at all? Do we do any inlining with -O0? What about #[inline(always)]?

michaelwoerister · 2017-10-09T13:04:50Z

@bors retry

bors · 2017-10-09T14:00:20Z

⌛ Testing commit 4b2bdf7 with merge 72d6501...

rustc: Don't inline in CGUs at -O0 This commit tweaks the behavior of inlining functions into multiple codegen units when rustc is compiling in debug mode. Today rustc will unconditionally treat `#[inline]` functions by translating them into all codegen units that they're needed within, marking the linkage as `internal`. This commit changes the behavior so that in debug mode (compiling at `-O0`) rustc will instead only translate `#[inline]` functions into *one* codegen unit, forcing all other codegen units to reference this one copy. The goal here is to improve debug compile times by reducing the amount of translation that happens on behalf of multiple codegen units. It was discovered in #44941 that increasing the number of codegen units had the adverse side effect of increasing the overal work done by the compiler, and the suspicion here was that the compiler was inlining, translating, and codegen'ing more functions with more codegen units (for example `String` would be basically inlined into all codegen units if used). The strategy in this commit should reduce the cost of `#[inline]` functions to being equivalent to one codegen unit, which is only translating and codegen'ing inline functions once. Collected [data] shows that this does indeed improve the situation from [before] as the overall cpu-clock time increases at a much slower rate and when pinned to one core rustc does not consume significantly more wall clock time than with one codegen unit. One caveat of this commit is that the symbol names for inlined functions that are only translated once needed some slight tweaking. These inline functions could be translated into multiple crates and we need to make sure the symbols don't collideA so the crate name/disambiguator is mixed in to the symbol name hash in these situations. [data]: #44941 (comment) [before]: #44941 (comment)

aidanhs · 2017-10-09T16:00:59Z

[01:24:43] Dist std stage2 (x86_64-unknown-linux-gnu -> x86_64-unknown-netbsd)
[01:26:25] Dist analysis
[01:26:25] image_src: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-std/x86_64-unknown-netbsd/release/deps/save-analysis", dst: "/checkout/obj/build/tmp/dist/rust-analysis-nightly-x86_64-unknown-netbsd-image/lib/rustlib/x86_64-unknown-netbsd/analysis"
[01:26:26] Dist src
[01:26:35] Dist extended stage2 (x86_64-unknown-netbsd)
[01:26:35] Dist cargo stage2 (x86_64-unknown-netbsd)
[01:26:35]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[01:26:35]                                  Dload  Upload   Total   Spent    Left  Speed
[01:26:35] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   282    0   282    0     0    847      0 --:--:-- --:--:-- --:--:--   849
[01:26:35] thread 'main' panicked at 'downloaded openssl sha256 different
[01:26:35] expected: 6b3977c61f2aedf0f96367dcfb5c6e578cf37e7b8d913b4ecb6643c3cb88d8c0
[01:26:35] found:    8552e3169b5a6071edfab25bf7dacb3bbd40ae5325bd472690047b90c1cd0589
[01:26:35] ', /checkout/src/bootstrap/native.rs:380:16
[01:26:35] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[01:26:35] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap dist --host x86_64-unknown-netbsd --target x86_64-unknown-netbsd
[01:26:35] Build completed unsuccessfully in 1:24:19

Possibly #40474?

bors · 2017-10-09T16:31:11Z

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing 72d6501 to master...

@Mark-Simulacrum

…nload-error, r=Mark-Simulacrum rustbuild: Make openssl download more reliable. 1. Add `-f` flag to curl, so when the server returns 403 or 500 it will fail immediately. 2. Moved the checksum part into the retry loop, assuming checksum failure is due to broken download that can be fixed by downloading again. This PR is created responding to two recent spurious failures in rust-lang#45075 (comment) and rust-lang#45030 (comment). r? @Mark-Simulacrum , cc @aidanhs

rust-highfive assigned eddyb Oct 6, 2017

rust-highfive assigned michaelwoerister and unassigned eddyb Oct 6, 2017

alexcrichton mentioned this pull request Oct 6, 2017

32 codegen units may not always be better at -O0 #44941

Closed

alexcrichton commented Oct 6, 2017

View reviewed changes

alexcrichton force-pushed the inline-less branch from cb7f7f0 to 4b2bdf7 Compare October 8, 2017 02:09

carols10cents added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 9, 2017

bors merged commit 4b2bdf7 into rust-lang:master Oct 9, 2017

bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 9, 2017

alexcrichton deleted the inline-less branch October 10, 2017 18:46

kennytm mentioned this pull request Oct 11, 2017

rustbuild: Make openssl download more reliable. #45209

Merged

saethlin mentioned this pull request Dec 5, 2024

Remove -Zinline-in-all-cgus rust-lang/compiler-team#814

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc: Don't inline in CGUs at -O0 #45075

rustc: Don't inline in CGUs at -O0 #45075

alexcrichton commented Oct 6, 2017

rust-highfive commented Oct 6, 2017

alexcrichton commented Oct 6, 2017

alexcrichton Oct 6, 2017

alexcrichton commented Oct 6, 2017

michaelwoerister commented Oct 7, 2017

bors commented Oct 7, 2017

alexcrichton commented Oct 7, 2017

mersinvald commented Oct 7, 2017 •

edited

Loading

alexcrichton commented Oct 7, 2017

mersinvald commented Oct 7, 2017

mersinvald commented Oct 7, 2017

bors commented Oct 8, 2017

alexcrichton commented Oct 8, 2017

bors commented Oct 8, 2017

mersinvald commented Oct 8, 2017 •

edited

Loading

michaelwoerister commented Oct 9, 2017

michaelwoerister commented Oct 9, 2017

alexcrichton commented Oct 9, 2017

bors commented Oct 9, 2017

bors commented Oct 9, 2017

michaelwoerister commented Oct 9, 2017

michaelwoerister commented Oct 9, 2017

bors commented Oct 9, 2017

aidanhs commented Oct 9, 2017

bors commented Oct 9, 2017

rustc: Don't inline in CGUs at -O0 #45075

rustc: Don't inline in CGUs at -O0 #45075

Conversation

alexcrichton commented Oct 6, 2017

rust-highfive commented Oct 6, 2017

alexcrichton commented Oct 6, 2017

alexcrichton Oct 6, 2017

Choose a reason for hiding this comment

alexcrichton commented Oct 6, 2017

michaelwoerister commented Oct 7, 2017

bors commented Oct 7, 2017

alexcrichton commented Oct 7, 2017

mersinvald commented Oct 7, 2017 • edited Loading

alexcrichton commented Oct 7, 2017

mersinvald commented Oct 7, 2017

mersinvald commented Oct 7, 2017

bors commented Oct 8, 2017

alexcrichton commented Oct 8, 2017

bors commented Oct 8, 2017

mersinvald commented Oct 8, 2017 • edited Loading

michaelwoerister commented Oct 9, 2017

michaelwoerister commented Oct 9, 2017

alexcrichton commented Oct 9, 2017

bors commented Oct 9, 2017

bors commented Oct 9, 2017

michaelwoerister commented Oct 9, 2017

michaelwoerister commented Oct 9, 2017

bors commented Oct 9, 2017

aidanhs commented Oct 9, 2017

bors commented Oct 9, 2017

mersinvald commented Oct 7, 2017 •

edited

Loading

mersinvald commented Oct 8, 2017 •

edited

Loading