Optimize layout of function arguments in the Rust ABI - take 2 #97559

Urgau · 2022-05-30T19:30:52Z

This PR is another attempt to solving the problem of floats being aggregated as integer, it's purpose is to supersede #94570 which was a big bulldozer and has cause some problem like #97540.

This PR is meant to address issues like #85265 (comment) where floats where being aggregated as integer causing poor codegen. It does this by only passing homogeneous aggregate of floats as array if <= ptr-size and by pointer on array if > ptr-size as an optimization.

It's also a reopening of #93564 with hopefully no regression at all.

Rust code + asm before/after

Rust code:

pub fn foo() -> Result<u64, core::ptr::NonNull<()>> {
    Ok(0)
}

ASM Before:

foo:
        mov     rax, rdi
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        ret

ASM After:

foo:
	xor	eax, eax
	xor	edx, edx
	ret

cc @nbdd0121 @workingjubilee

Revert #94570
Fixes #97540

rust-highfive · 2022-05-30T19:30:56Z

r? @cjgillot

(rust-highfive has picked a reviewer for you, use r? to override)

compiler/rustc_middle/src/ty/layout.rs

nbdd0121 · 2022-05-30T20:07:10Z

compiler/rustc_middle/src/ty/layout.rs

+                        } else if unlikely!(self.has_all_float(&arg.layout)) {
+                            // We don't want to aggregate floats as an aggregates of Integer
+                            // because this will hurt the generated assembly (#93490)
+                            //
+                            // As an optimization we want to pass homogeneous aggregate of floats
+                            // greater than pointer size as indirect
+                            if size > Pointer.size(self) {
+                                arg.make_indirect();
+                            }
+                        } else {


Suggested change

} else if unlikely!(self.has_all_float(&arg.layout)) {

// We don't want to aggregate floats as an aggregates of Integer

// because this will hurt the generated assembly (#93490)

//

// As an optimization we want to pass homogeneous aggregate of floats

// greater than pointer size as indirect

if size > Pointer.size(self) {

arg.make_indirect();

}

} else {

} else if size > Pointer.size(self) && unlikely!(self.has_all_float(&arg.layout)) {

// We don't want to aggregate floats as an aggregates of Integer

// because this will hurt the generated assembly (#93490)

//

// As an optimization we want to pass homogeneous aggregate of floats

// greater than pointer size as indirect

arg.make_indirect();

} else {

This is not the same logic as before. Before your suggestion if it was self.has_all_float(&arg.layout) but NOT size > Pointer.size(self) nothing would happen, with this suggestion it will take the else branch doing exactly the opposite of what we want to make.

Is there any particular reason that the current behaviour is undesirable for small aggregates with float?

Also, even if it's desirable that small aggregates with float has a special treatment, it should probably be a separate PR that's backed by separate perf run.

Well on x86_64 this would mean that f64 -> i64, f32 -> i32 and [f32; 2] -> i64 which is not ideal and is the reason of this PR, ie let the possibility for the codegen to use floating point register for small aggregate of floats or to put them behind an indirection.

f64 and f32 are not Abi::Aggregate and will not hit this path. For [f32; 2], the current status quo is to convert it to integer.

Essentially this PR is doing two things:

Pass not-all-float aggregates smaller than 2 register directly

Stop casting all-float aggregates smaller than 1 register to integer

I am suggesting just do 1 in this PR.

I tried your suggestion locally and only one test failed. I will push it after the perf run so we can have a baseline to know if it's even necessary or not.

Well the perf impact of introducing this "fast path" is negligible: #97559 (comment)
https://perf.rust-lang.org/compare.html?start=f220c170f6f0f60621e43d2cb761f7711ae57c43&end=5f9cf23a74c3b0e90cc66c155ed37b74a7906093&stat=instructions%3Au

the8472 · 2022-05-30T20:49:38Z

@bors try @rust-timer queue

rust-timer · 2022-05-30T20:49:39Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-05-30T20:49:46Z

⌛ Trying commit 428cb36d1f205f6b99b2ae33849b1957b496d7eb with merge f220c170f6f0f60621e43d2cb761f7711ae57c43...

bors · 2022-05-30T22:24:48Z

☀️ Try build successful - checks-actions
Build commit: f220c170f6f0f60621e43d2cb761f7711ae57c43 (f220c170f6f0f60621e43d2cb761f7711ae57c43)

rust-timer · 2022-05-30T22:24:49Z

Queued f220c170f6f0f60621e43d2cb761f7711ae57c43 with parent 4a8d2e3, future comparison URL.

rust-timer · 2022-05-30T23:43:23Z

Finished benchmarking commit (f220c170f6f0f60621e43d2cb761f7711ae57c43): comparison url.

Instruction count

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	0.6%	3.3%	132
Regressions 😿 (secondary)	1.7%	7.3%	76
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.6%	3.3%	132

Max RSS (memory usage)

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.1%	3.9%	3
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-3.1%	-3.1%	1
All 😿🎉 (primary)	N/A	N/A	0

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	3.4%	4.4%	4
Regressions 😿 (secondary)	4.4%	7.9%	6
Improvements 🎉 (primary)	-2.5%	-2.8%	2
Improvements 🎉 (secondary)	-3.6%	-6.4%	15
All 😿🎉 (primary)	1.5%	4.4%	6

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

the8472 · 2022-05-31T14:02:18Z

@bors try @rust-timer queue

rust-timer · 2022-05-31T14:02:22Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-05-31T14:02:34Z

⌛ Trying commit 4a6387ad206cdbfda544eb8955d521547bbf8ea6 with merge 5f9cf23a74c3b0e90cc66c155ed37b74a7906093...

bors · 2022-05-31T15:57:14Z

☀️ Try build successful - checks-actions
Build commit: 5f9cf23a74c3b0e90cc66c155ed37b74a7906093 (5f9cf23a74c3b0e90cc66c155ed37b74a7906093)

rust-timer · 2022-05-31T15:57:16Z

Queued 5f9cf23a74c3b0e90cc66c155ed37b74a7906093 with parent dcbd5f5, future comparison URL.

rust-timer · 2022-05-31T17:32:08Z

Finished benchmarking commit (5f9cf23a74c3b0e90cc66c155ed37b74a7906093): comparison url.

Instruction count

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	0.6%	3.1%	126
Regressions 😿 (secondary)	1.6%	7.3%	77
Improvements 🎉 (primary)	-0.9%	-1.1%	4
Improvements 🎉 (secondary)	-1.1%	-1.1%	3
All 😿🎉 (primary)	0.6%	3.1%	130

Max RSS (memory usage)

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: 🎉 relevant improvement found

	mean¹	max	count²
Regressions 😿 (primary)	1.1%	1.1%	1
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-2.2%	-2.2%	1
All 😿🎉 (primary)	1.1%	1.1%	1

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	2.8%	4.1%	4
Regressions 😿 (secondary)	2.7%	3.7%	3
Improvements 🎉 (primary)	-2.5%	-2.5%	1
Improvements 🎉 (secondary)	-4.6%	-8.1%	13
All 😿🎉 (primary)	1.7%	4.1%	5

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

apiraino · 2022-06-30T12:47:41Z

Maybe ready for another round of review after the latest perf. run? Please switch back to the author if more work is needed, thanks!

@rustbot ready

workingjubilee

r? @workingjubilee

This is not an improvement in compile times, and I do not think the improvement in codegen is so clearly significant, from the examples given, as to warrant merging in spite of that. It would be sufficient if we had a sample microbenchmark this wins on in terms of runtime before/after.

@rustbot author

This reverts commit 2e0cf271285089316db55b995312712638126245.

This reverts commit e136c3a9348200c261b9b3c1c50a2f6f6a68b4bd.

Urgau · 2022-07-05T10:52:33Z

I'm surprised that regex-processing would be affected by changes to float codegen at all.

Me too, and thanks to @bjorn3 #97559 (comment) we now know that it wasn't related to float at all but a changed that I done the the max_by_val_size, which explain the regressions in regex.
Could I get another perf run, it should now measure only the cost of not aggregating floats.

the8472 · 2022-07-05T11:10:07Z

@bors try @rust-timer queue

rust-timer · 2022-07-05T11:10:09Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-05T11:10:15Z

⌛ Trying commit 683e13f with merge f683963771547e2f8634ff14af3fe3926c12ebd5...

bors · 2022-07-05T12:39:06Z

☀️ Try build successful - checks-actions
Build commit: f683963771547e2f8634ff14af3fe3926c12ebd5 (f683963771547e2f8634ff14af3fe3926c12ebd5)

rust-timer · 2022-07-05T12:39:08Z

Queued f683963771547e2f8634ff14af3fe3926c12ebd5 with parent 4045ce6, future comparison URL.

rust-timer · 2022-07-05T16:28:05Z

Finished benchmarking commit (f683963771547e2f8634ff14af3fe3926c12ebd5): comparison url.

Instruction count

Primary benchmarks: 😿 relevant regressions found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	0.7%	1.0%	9
Regressions 😿 (secondary)	3.1%	9.3%	15
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-0.2%	-0.2%	1
All 😿🎉 (primary)	0.7%	1.0%	9

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	4.4%	5.1%	2
Improvements 🎉 (primary)	-0.0%	-0.0%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-0.0%	-0.0%	1

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.8%	4.9%	11
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

eddyb · 2022-07-07T09:51:30Z

My issue with the current casting behaviour is that it forces everything through integer registers, which is really suboptimal when float is involved. My hope was that by using LLVM aggregates, LLVM could generate more optimal code, but obviously that's not case. I suppose another way would be having some logic that does classificiation for the Rust ABI similar to how x86_64 ABI classifies arguments.

@nbdd0121 Right, what SysV x64 ABI does is more or less the only way to reliably do this.

The whole thing with LLVM aggregates and us calling this mode "casting" obscures the fact that if you want to choose which registers something gets passed in, there's no way around looking at its contents and making a decision, nothing else can make that decision for you (though we can provide helpers for common patterns across ABIs, like the "homogeneous aggregate" one).

It would be much clearer if we were decomposing a Rust argument into LLVM scalar/vector arguments, and optionally a stack passing argument (but since Clang doesn't already do this for its ABI handling, we risk ABI incompatibilities, so we'd need a lot more testing - see also #65111).

…ts, r=Amanieu Test that target feature mix up with homogeneous floats is sound This pull-request adds a test in `src/test/abi/` that test that target feature mix up with homogeneous floats is sound. This is basically is ripoff of [src/test/ui/simd/target-feature-mixup.rs](https://github.com/rust-lang/rust/blob/47d1cdb0bcac8e417071ce1929d261efe2399ae2/src/test/ui/simd/target-feature-mixup.rs) but for floats and without `#[repr(simd)]`. *Extracted from rust-lang#97559 since I don't yet know what to do with that PR.*

Urgau · 2022-09-25T20:20:46Z

I don't currently have the time to continue this particular PR, especially if I want to follow eddyb comment.

Closing it.

…ts, r=Amanieu Test that target feature mix up with homogeneous floats is sound This pull-request adds a test in `src/test/abi/` that test that target feature mix up with homogeneous floats is sound. This is basically is ripoff of [src/test/ui/simd/target-feature-mixup.rs](https://github.com/rust-lang/rust/blob/47d1cdb0bcac8e417071ce1929d261efe2399ae2/src/test/ui/simd/target-feature-mixup.rs) but for floats and without `#[repr(simd)]`. *Extracted from rust-lang#97559 since I don't yet know what to do with that PR.*

rust-highfive assigned cjgillot May 30, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 30, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 30, 2022

Urgau mentioned this pull request May 30, 2022

Result that uses niches resulting in a final size of 16 bytes emits poor LLVM IR due to ABI #97540

Closed

nbdd0121 reviewed May 30, 2022

View reviewed changes

compiler/rustc_middle/src/ty/layout.rs Outdated Show resolved Hide resolved

nbdd0121 reviewed May 30, 2022

View reviewed changes

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 30, 2022

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 30, 2022

Urgau force-pushed the fn-args-abi-optimization branch from 1d103bc to 4a6387a Compare May 31, 2022 08:47

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022

workingjubilee requested changes Jul 1, 2022

View reviewed changes

rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 1, 2022

Urgau added 9 commits July 5, 2022 11:44

Fix some codegen tests

ec16a32

Use simpler heuristic for determining if a layout only floats

9ed05ed

Use nbdd0121 suggestion for reducing the perf impact

f1c72be

Remove undefined unlikely! macro

1be1d4a

Revert "Use nbdd0121 suggestion for reducing the perf impact"

b2fba9a

This reverts commit 2e0cf271285089316db55b995312712638126245.

Let LLVM also handle small aggregate

a84f4c9

Revert "Revert "Use nbdd0121 suggestion for reducing the perf impact""

0c1451c

This reverts commit e136c3a9348200c261b9b3c1c50a2f6f6a68b4bd.

Retry with the homogeneous aggregate concept

c7e8880

Revert max_by_val_size * 2

683e13f

Urgau force-pushed the fn-args-abi-optimization branch from a0ede51 to 683e13f Compare July 5, 2022 10:01

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2022

Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 31, 2022

Urgau mentioned this pull request Sep 3, 2022

Test that target feature mix up with homogeneous floats is sound #101381

Merged

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 11, 2022

Urgau closed this Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize layout of function arguments in the Rust ABI - take 2 #97559

Optimize layout of function arguments in the Rust ABI - take 2 #97559

Urgau commented May 30, 2022

rust-highfive commented May 30, 2022

nbdd0121 May 30, 2022

Urgau May 30, 2022

nbdd0121 May 30, 2022

nbdd0121 May 30, 2022

Urgau May 30, 2022

nbdd0121 May 30, 2022

Urgau May 30, 2022

Urgau May 31, 2022 •

edited

Loading

the8472 commented May 30, 2022

rust-timer commented May 30, 2022

bors commented May 30, 2022

bors commented May 30, 2022

rust-timer commented May 30, 2022

rust-timer commented May 30, 2022

the8472 commented May 31, 2022

rust-timer commented May 31, 2022

bors commented May 31, 2022

bors commented May 31, 2022

rust-timer commented May 31, 2022

rust-timer commented May 31, 2022

apiraino commented Jun 30, 2022 •

edited

Loading

workingjubilee left a comment

Urgau commented Jul 5, 2022

the8472 commented Jul 5, 2022

rust-timer commented Jul 5, 2022

bors commented Jul 5, 2022

bors commented Jul 5, 2022

rust-timer commented Jul 5, 2022

rust-timer commented Jul 5, 2022

eddyb commented Jul 7, 2022

Urgau commented Sep 25, 2022

Optimize layout of function arguments in the Rust ABI - take 2 #97559

Optimize layout of function arguments in the Rust ABI - take 2 #97559

Conversation

Urgau commented May 30, 2022

rust-highfive commented May 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Urgau May 31, 2022 • edited Loading

Choose a reason for hiding this comment

the8472 commented May 30, 2022

rust-timer commented May 30, 2022

bors commented May 30, 2022

bors commented May 30, 2022

rust-timer commented May 30, 2022

rust-timer commented May 30, 2022

Footnotes

the8472 commented May 31, 2022

rust-timer commented May 31, 2022

bors commented May 31, 2022

bors commented May 31, 2022

rust-timer commented May 31, 2022

rust-timer commented May 31, 2022

Footnotes

apiraino commented Jun 30, 2022 • edited Loading

workingjubilee left a comment

Choose a reason for hiding this comment

Urgau commented Jul 5, 2022

the8472 commented Jul 5, 2022

rust-timer commented Jul 5, 2022

bors commented Jul 5, 2022

bors commented Jul 5, 2022

rust-timer commented Jul 5, 2022

rust-timer commented Jul 5, 2022

Footnotes

eddyb commented Jul 7, 2022

Urgau commented Sep 25, 2022

Urgau May 31, 2022 •

edited

Loading

apiraino commented Jun 30, 2022 •

edited

Loading