-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc: Don't inline in CGUs at -O0 #45075
Conversation
r? @eddyb (rust_highfive has picked a reviewer for you, use r? to override) |
@@ -280,75 +280,74 @@ fn place_root_translation_items<'a, 'tcx, I>(tcx: TyCtxt<'a, 'tcx, 'tcx>, | |||
let mut internalization_candidates = FxHashSet(); | |||
|
|||
for trans_item in trans_items { | |||
let is_root = trans_item.instantiation_mode(tcx) == InstantiationMode::GloballyShared; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file's diff is probably best viewed with no whitespace (very few changes here, just indentation)
Oh I should also mention that I only changed the compiler's default behavior in O0 mode. My thinking was that if we don't have any way to inline into all codegen units then we basically kill performance for optimized codegen unit builds, so I didn't want to tamper with anyone relying on that. Once we have ThinLTO, however, that I believe is the avenue by which we'd achieve inlining, so I think we could turn this behavior on by default. |
@bors r+ Thanks, @alexcrichton ! I'm excited about this @alexcrichton it would be great if you could do some runtime performance benchmarks to see the effect of this. One thing that I'm not sure about is whether it is a good idea to also apply this to drop-glue and shims. With the changes in this PR they will all end up in one big codegen unit (the "fallback codegen-unit) and that might not be what we want. Also |
📌 Commit cb7f7f0 has been approved by |
@michaelwoerister when you say performance benchmarks, you mean of activating this in release mode? I sort of assumed that the benchmarks could only be worse than this which is where we don't inline generics across codegen units but we inline |
@alexcrichton I've tested this PR on the same project as before:
Interesting that now 2 CGUs performing only as goog as 32 (with inlining it was the fastest option), and 4 CGUs seems the fastest and most stable option. Also now builds are faster with any CGU number then with inlining. (approx. -10 secs at least). |
@mersinvald fascinating! I'm particularly curious about the huge dip using one codegen unit. Before you mentioned that one CGU was 80.41s to compile and now you're seeing 65.41s. That's quite a big improvement, and definitely shouldn't be affected by this PR! (this PR should have the same performance in one-CGU mode before and after). Do you know what might be causing that discrepancy? I'm still surprised at the drastic difference between 4 and 32 cgus! |
@alexcrichton I'll recheck now with one-commit-before-this-pr rustc, our code base changed a bit, I am sorry for confusion in advance :) |
@alexcrichton rechecked.
So, disabling inlines clearly has a good effect, dip between 4 and 32 GCUs is lower. -2 secs with 4 CGUs anyway) |
☔ The latest upstream changes (presumably #44841) made this pull request unmergeable. Please resolve the merge conflicts. |
This commit tweaks the behavior of inlining functions into multiple codegen units when rustc is compiling in debug mode. Today rustc will unconditionally treat `#[inline]` functions by translating them into all codegen units that they're needed within, marking the linkage as `internal`. This commit changes the behavior so that in debug mode (compiling at `-O0`) rustc will instead only translate `#[inline]` functions into *one* codegen unit, forcing all other codegen units to reference this one copy. The goal here is to improve debug compile times by reducing the amount of translation that happens on behalf of multiple codegen units. It was discovered in rust-lang#44941 that increasing the number of codegen units had the adverse side effect of increasing the overal work done by the compiler, and the suspicion here was that the compiler was inlining, translating, and codegen'ing more functions with more codegen units (for example `String` would be basically inlined into all codegen units if used). The strategy in this commit should reduce the cost of `#[inline]` functions to being equivalent to one codegen unit, which is only translating and codegen'ing inline functions once. Collected [data] shows that this does indeed improve the situation from [before] as the overall cpu-clock time increases at a much slower rate and when pinned to one core rustc does not consume significantly more wall clock time than with one codegen unit. One caveat of this commit is that the symbol names for inlined functions that are only translated once needed some slight tweaking. These inline functions could be translated into multiple crates and we need to make sure the symbols don't collideA so the crate name/disambiguator is mixed in to the symbol name hash in these situations. [data]: rust-lang#44941 (comment) [before]: rust-lang#44941 (comment)
cb7f7f0
to
4b2bdf7
Compare
@mersinvald hm ok very interesting! Thanks so much again for taking the time to collect this data :) Something still feels not quite right though about the 4 -> 32 codegen unit transition. Do you know if there's one crate in particular that slows down by 4 seconds? Or does everything in general just slow down a little bit? @bors: r=michaelwoerister |
📌 Commit 4b2bdf7 has been approved by |
@alexcrichton if there were some cargo option to display timings for every built crate, I could measure it. Is there something like that? |
@mersinvald Do you maybe have a mechanical hard drive instead of an SSD in the computer you are testing this on? Having more CGUs will lead to more I/O and we are probably testing mostly on machine with at least some kind of SSD. |
@bors p=1 (this is kind of important) |
@mersinvald ah unfotunately there's not anything easy to see how long crates are compiling for, but you may know of some that take longer than others perhaps? |
⌛ Testing commit 4b2bdf7 with merge 199e837fe292873170dc1cf679ca6345a9df41cf... |
💔 Test failed - status-travis |
No, I was actually interested in the runtime performance difference in debug mode. In particular: is there any difference at all? Do we do any inlining with |
@bors retry |
rustc: Don't inline in CGUs at -O0 This commit tweaks the behavior of inlining functions into multiple codegen units when rustc is compiling in debug mode. Today rustc will unconditionally treat `#[inline]` functions by translating them into all codegen units that they're needed within, marking the linkage as `internal`. This commit changes the behavior so that in debug mode (compiling at `-O0`) rustc will instead only translate `#[inline]` functions into *one* codegen unit, forcing all other codegen units to reference this one copy. The goal here is to improve debug compile times by reducing the amount of translation that happens on behalf of multiple codegen units. It was discovered in #44941 that increasing the number of codegen units had the adverse side effect of increasing the overal work done by the compiler, and the suspicion here was that the compiler was inlining, translating, and codegen'ing more functions with more codegen units (for example `String` would be basically inlined into all codegen units if used). The strategy in this commit should reduce the cost of `#[inline]` functions to being equivalent to one codegen unit, which is only translating and codegen'ing inline functions once. Collected [data] shows that this does indeed improve the situation from [before] as the overall cpu-clock time increases at a much slower rate and when pinned to one core rustc does not consume significantly more wall clock time than with one codegen unit. One caveat of this commit is that the symbol names for inlined functions that are only translated once needed some slight tweaking. These inline functions could be translated into multiple crates and we need to make sure the symbols don't collideA so the crate name/disambiguator is mixed in to the symbol name hash in these situations. [data]: #44941 (comment) [before]: #44941 (comment)
Possibly #40474? |
☀️ Test successful - status-appveyor, status-travis |
…nload-error, r=Mark-Simulacrum rustbuild: Make openssl download more reliable. 1. Add `-f` flag to curl, so when the server returns 403 or 500 it will fail immediately. 2. Moved the checksum part into the retry loop, assuming checksum failure is due to broken download that can be fixed by downloading again. This PR is created responding to two recent spurious failures in rust-lang#45075 (comment) and rust-lang#45030 (comment). r? @Mark-Simulacrum , cc @aidanhs
This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat
#[inline]
functions by translating them into all codegen units thatthey're needed within, marking the linkage as
internal
. This commit changesthe behavior so that in debug mode (compiling at
-O0
) rustc will instead onlytranslate
#[inline]
functions into one codegen unit, forcing all othercodegen units to reference this one copy.
The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example
String
would be basicallyinlined into all codegen units if used). The strategy in this commit should
reduce the cost of
#[inline]
functions to being equivalent to one codegenunit, which is only translating and codegen'ing inline functions once.
Collected data shows that this does indeed improve the situation from before
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.
One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.