-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize zipping over array iterators #115515
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 0580b27 with merge 3effcd54b6ab31955a45d85d1ca90746310d294b... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (3effcd54b6ab31955a45d85d1ca90746310d294b): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 629.273s -> 628.851s (-0.07%) |
#[inline] | ||
unsafe fn __iterator_get_unchecked(&mut self, idx: usize) -> Self::Item { | ||
// SAFETY: The caller must provide an idx that is in bound of the remainder. | ||
unsafe { self.data.as_ptr().add(self.alive.start()).add(idx).cast::<T>().read() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, of course, random access to an array iterator makes a ton of sense 👍
Hmm, the LLVM IR for the array version in nightly has Why that would make the codegen so much worse is unclear to me, though. Does the same thing happen for different lengths? Or is it just Regardless, I think doing this to enable random access for |
optimize zipping over array iterators Fixes rust-lang#115339 (somewhat) the new assembly: ```asm zip_arrays: .cfi_startproc vmovups (%rdx), %ymm0 leaq 32(%rsi), %rcx vxorps %xmm1, %xmm1, %xmm1 vmovups %xmm1, -24(%rsp) movq $0, -8(%rsp) movq %rsi, -88(%rsp) movq %rdi, %rax movq %rcx, -80(%rsp) vmovups %ymm0, -72(%rsp) movq $0, -40(%rsp) movq $32, -32(%rsp) movq -24(%rsp), %rcx vmovups (%rsi,%rcx), %ymm0 vorps -72(%rsp,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rsi,%rcx) vmovups (%rsi), %ymm0 vmovups %ymm0, (%rdi) vzeroupper retq ``` This is still longer than the slice version given in the issue but at least it eliminates the terrible `vpextrb`/`orb` chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)? The `TrustedLen` specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize the `TrustedRandomAccessNoCoerce` impl.
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors r- Hmm, I wonder if the upstream changes mentioned in #115339 (comment) will obviate this at all... |
0580b27
to
b018ad3
Compare
I don't know but guess that random access could still be beneficial for more complex combinations of array iterators + zip. |
@rustbot ready I think the issue was that nopt builds disable optimizations for tests too if an optimization level hasn't been set explicitly |
@bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (0d410be): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 624.559s -> 624.729s (0.03%) |
54: Pull upstream master 2023 10 17 r=pietroalbini a=Veykril * rust-lang/rust#116196 * rust-lang/rust#116824 * rust-lang/rust#116822 * rust-lang/rust#116477 * rust-lang/rust#116826 * rust-lang/rust#116820 * rust-lang/rust#116811 * rust-lang/rust#116808 * rust-lang/rust#116805 * rust-lang/rust#116800 * rust-lang/rust#116798 * rust-lang/rust#116754 * rust-lang/rust#114370 * rust-lang/rust#116804 * rust-lang/rust#116802 * rust-lang/rust#116790 * rust-lang/rust#116786 * rust-lang/rust#116709 * rust-lang/rust#116430 * rust-lang/rust#116257 * rust-lang/rust#114157 * rust-lang/rust#116731 * rust-lang/rust#116550 * rust-lang/rust#114330 * rust-lang/rust#116724 * rust-lang/rust#116782 * rust-lang/rust#116776 * rust-lang/rust#115955 * rust-lang/rust#115196 * rust-lang/rust#116775 * rust-lang/rust#114589 * rust-lang/rust#113747 * rust-lang/rust#116772 * rust-lang/rust#116771 * rust-lang/rust#116760 * rust-lang/rust#116755 * rust-lang/rust#116732 * rust-lang/rust#116522 * rust-lang/rust#116341 * rust-lang/rust#116172 * rust-lang/rust#110604 * rust-lang/rust#110729 * rust-lang/rust#116527 * rust-lang/rust#116688 * rust-lang/rust#116757 * rust-lang/rust#116753 * rust-lang/rust#116748 * rust-lang/rust#116741 * rust-lang/rust#116594 * rust-lang/rust#116691 * rust-lang/rust#116643 * rust-lang/rust#116683 * rust-lang/rust#116635 * rust-lang/rust#115515 * rust-lang/rust#116742 * rust-lang/rust#116661 * rust-lang/rust#116576 * rust-lang/rust#116540 * rust-lang/rust#116352 * rust-lang/rust#116737 * rust-lang/rust#116730 * rust-lang/rust#116723 * rust-lang/rust#116715 * rust-lang/rust#116603 * rust-lang/rust#116591 * rust-lang/rust#115439 * rust-lang/rust#116264 * rust-lang/rust#116727 * rust-lang/rust#116704 * rust-lang/rust#116696 * rust-lang/rust#116695 * rust-lang/rust#116644 * rust-lang/rust#116630 * rust-lang/rust#116728 * rust-lang/rust#116689 * rust-lang/rust#116679 * rust-lang/rust#116618 * rust-lang/rust#116577 * rust-lang/rust#115653 * rust-lang/rust#116702 * rust-lang/rust#116015 * rust-lang/rust#115822 * rust-lang/rust#116407 * rust-lang/rust#115719 * rust-lang/rust#115524 * rust-lang/rust#116705 * rust-lang/rust#116645 * rust-lang/rust#116233 * rust-lang/rust#115108 * rust-lang/rust#116670 * rust-lang/rust#116676 * rust-lang/rust#116666 Co-authored-by: Benoît du Garreau <[email protected]> Co-authored-by: Colin Finck <[email protected]> Co-authored-by: Ian Jackson <[email protected]> Co-authored-by: Joshua Liebow-Feeser <[email protected]> Co-authored-by: León Orell Valerian Liehr <[email protected]> Co-authored-by: Trevor Gross <[email protected]> Co-authored-by: Evan Merlock <[email protected]> Co-authored-by: joboet <[email protected]> Co-authored-by: Ralf Jung <[email protected]> Co-authored-by: DaniPopes <[email protected]> Co-authored-by: Mark Rousskov <[email protected]> Co-authored-by: onur-ozkan <[email protected]> Co-authored-by: Nicholas Nethercote <[email protected]> Co-authored-by: The 8472 <[email protected]> Co-authored-by: Samuel Thibault <[email protected]> Co-authored-by: reez12g <[email protected]> Co-authored-by: Jakub Beránek <[email protected]>
Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
…-errors Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
…-errors Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
…-errors Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
Fixes #115339 (somewhat)
the new assembly:
This is still longer than the slice version given in the issue but at least it eliminates the terrible
vpextrb
/orb
chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)?The
TrustedLen
specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize theTrustedRandomAccessNoCoerce
impl.