Stop manually SIMDing in `swap_nonoverlapping` #94212

scottmcm · 2022-02-21T08:33:24Z

Like I previously did for reverse (#90821), this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.

A variety of codegen tests are included to confirm that the various cases are still being vectorized.

It does still need logic to type-erase in some cases, though, as while LLVM is now smart enough to vectorize over slices of things like [u8; 4], it fails to do so over slices of [u8; 3].

As a bonus, this change also means one no longer gets the spurious memcpy(s?) at the end up swapping a slice of __m256s: https://rust.godbolt.org/z/joofr4v8Y

ASM for this example

Before (from godbolt)

note the push/pops and memcpy

swap_m256_slice:
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx
        sub     rsp, 32
        cmp     rsi, rcx
        jne     .LBB0_6
        mov     r14, rsi
        shl     r14, 5
        je      .LBB0_6
        mov     r15, rdx
        mov     rbx, rdi
        xor     eax, eax
.LBB0_3:
        mov     rcx, rax
        vmovaps ymm0, ymmword ptr [rbx + rax]
        vmovaps ymm1, ymmword ptr [r15 + rax]
        vmovaps ymmword ptr [rbx + rax], ymm1
        vmovaps ymmword ptr [r15 + rax], ymm0
        add     rax, 32
        add     rcx, 64
        cmp     rcx, r14
        jbe     .LBB0_3
        sub     r14, rax
        jbe     .LBB0_6
        add     rbx, rax
        add     r15, rax
        mov     r12, rsp
        mov     r13, qword ptr [rip + memcpy@GOTPCREL]
        mov     rdi, r12
        mov     rsi, rbx
        mov     rdx, r14
        vzeroupper
        call    r13
        mov     rdi, rbx
        mov     rsi, r15
        mov     rdx, r14
        call    r13
        mov     rdi, r15
        mov     rsi, r12
        mov     rdx, r14
        call    r13
.LBB0_6:
        add     rsp, 32
        pop     rbx
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        vzeroupper
        ret

After (from my machine)

Note no rsp manipulation, sorry for different ASM syntax

swap_m256_slice:
	cmpq	%r9, %rdx
	jne	.LBB1_6
	testq	%rdx, %rdx
	je	.LBB1_6
	cmpq	$1, %rdx
	jne	.LBB1_7
	xorl	%r10d, %r10d
	jmp	.LBB1_4
.LBB1_7:
	movq	%rdx, %r9
	andq	$-2, %r9
	movl	$32, %eax
	xorl	%r10d, %r10d
	.p2align	4, 0x90
.LBB1_8:
	vmovaps	-32(%rcx,%rax), %ymm0
	vmovaps	-32(%r8,%rax), %ymm1
	vmovaps	%ymm1, -32(%rcx,%rax)
	vmovaps	%ymm0, -32(%r8,%rax)
	vmovaps	(%rcx,%rax), %ymm0
	vmovaps	(%r8,%rax), %ymm1
	vmovaps	%ymm1, (%rcx,%rax)
	vmovaps	%ymm0, (%r8,%rax)
	addq	$2, %r10
	addq	$64, %rax
	cmpq	%r10, %r9
	jne	.LBB1_8
.LBB1_4:
	testb	$1, %dl
	je	.LBB1_6
	shlq	$5, %r10
	vmovaps	(%rcx,%r10), %ymm0
	vmovaps	(%r8,%r10), %ymm1
	vmovaps	%ymm1, (%rcx,%r10)
	vmovaps	%ymm0, (%r8,%r10)
.LBB1_6:
	vzeroupper
	retq

This does all its copying operations as either the original type or as MaybeUninits, so as far as I know there should be no potential abstract machine issues with reading padding bytes as integers.

Perf is essentially unchanged

Though perhaps with more target features this would help more, if it could pick bigger chunks

Before

running 10 tests
test slice::swap_with_slice_4x_usize_30                            ... bench:         894 ns/iter (+/- 11)
test slice::swap_with_slice_4x_usize_3000                          ... bench:      99,476 ns/iter (+/- 2,784)
test slice::swap_with_slice_5x_usize_30                            ... bench:       1,257 ns/iter (+/- 7)
test slice::swap_with_slice_5x_usize_3000                          ... bench:     139,922 ns/iter (+/- 959)
test slice::swap_with_slice_rgb_30                                 ... bench:         328 ns/iter (+/- 27)
test slice::swap_with_slice_rgb_3000                               ... bench:      16,215 ns/iter (+/- 176)
test slice::swap_with_slice_u8_30                                  ... bench:         312 ns/iter (+/- 9)
test slice::swap_with_slice_u8_3000                                ... bench:       5,401 ns/iter (+/- 123)
test slice::swap_with_slice_usize_30                               ... bench:         368 ns/iter (+/- 3)
test slice::swap_with_slice_usize_3000                             ... bench:      28,472 ns/iter (+/- 3,913)

After

running 10 tests
test slice::swap_with_slice_4x_usize_30                            ... bench:         868 ns/iter (+/- 36)
test slice::swap_with_slice_4x_usize_3000                          ... bench:      99,642 ns/iter (+/- 1,507)
test slice::swap_with_slice_5x_usize_30                            ... bench:       1,194 ns/iter (+/- 11)
test slice::swap_with_slice_5x_usize_3000                          ... bench:     139,761 ns/iter (+/- 5,018)
test slice::swap_with_slice_rgb_30                                 ... bench:         324 ns/iter (+/- 6)
test slice::swap_with_slice_rgb_3000                               ... bench:      15,962 ns/iter (+/- 287)
test slice::swap_with_slice_u8_30                                  ... bench:         281 ns/iter (+/- 5)
test slice::swap_with_slice_u8_3000                                ... bench:       5,324 ns/iter (+/- 40)
test slice::swap_with_slice_usize_30                               ... bench:         275 ns/iter (+/- 5)
test slice::swap_with_slice_usize_3000                             ... bench:      28,277 ns/iter (+/- 277)

scottmcm · 2022-02-21T08:36:45Z

It looks like highfive missed this, so I'll try to wake it up
r? rust-lang/libs

@bors rollup=iffy (this has codegen tests, which always make me nervous for rollups)

scottmcm · 2022-02-21T08:44:30Z

library/core/src/ptr/mod.rs

-    // FIXME repr(simd) broken on emscripten and redox
-    #[cfg_attr(not(any(target_os = "emscripten", target_os = "redox")), repr(simd))]
-    struct Block(u64, u64, u64, u64);
-    struct UnalignedBlock(u64, u64, u64, u64);


The diff for this file is pretty useless; you might want to read it in side-by-side instead: https://github.com/rust-lang/rust/pull/94212/files?diff=split&w=0

Like I previously did for `reverse`, this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have. It does still need logic to type-erase where appropriate, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`. As a bonus, this also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>

dtolnay

This implementation looks great to me.

dtolnay · 2022-02-22T22:17:24Z

@bors r+

bors · 2022-02-22T22:17:26Z

📌 Commit 8ca47d7 has been approved by dtolnay

bors · 2022-02-24T16:44:55Z

⌛ Testing commit 8ca47d7 with merge c7e3ec112c881757bcd4d57840e009620c505b15...

bors · 2022-02-24T16:48:15Z

💔 Test failed - checks-actions

scottmcm · 2022-02-24T18:27:12Z

@bors retry network issue

warning: spurious network error (2 tries remaining): failed to get 200 response from `[https://crates.io/api/v1/crates/serde/1.0.125/download`,](https://crates.io/api/v1/crates/serde/1.0.125/download%60,) got 502
warning: spurious network error (2 tries remaining): failed to get 200 response from `[https://crates.io/api/v1/crates/regex/1.5.4/download`,](https://crates.io/api/v1/crates/regex/1.5.4/download%60,) got 502
warning: spurious network error (2 tries remaining): failed to get 200 response from `[https://crates.io/api/v1/crates/serde_json/1.0.59/download`,](https://crates.io/api/v1/crates/serde_json/1.0.59/download%60,) got 502
error: failed to download from `[https://crates.io/api/v1/crates/cfg-if/1.0.0/download`](https://crates.io/api/v1/crates/cfg-if/1.0.0/download%60)

rust-log-analyzer · 2022-02-24T19:07:14Z

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

Stop manually SIMDing in `swap_nonoverlapping` Like I previously did for `reverse` (rust-lang#90821), this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have. A variety of codegen tests are included to confirm that the various cases are still being vectorized. It does still need logic to type-erase in some cases, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`. As a bonus, this change also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y> <details> <summary>ASM for this example</summary> ## Before (from godbolt) note the `push`/`pop`s and `memcpy` ```x86 swap_m256_slice: push r15 push r14 push r13 push r12 push rbx sub rsp, 32 cmp rsi, rcx jne .LBB0_6 mov r14, rsi shl r14, 5 je .LBB0_6 mov r15, rdx mov rbx, rdi xor eax, eax .LBB0_3: mov rcx, rax vmovaps ymm0, ymmword ptr [rbx + rax] vmovaps ymm1, ymmword ptr [r15 + rax] vmovaps ymmword ptr [rbx + rax], ymm1 vmovaps ymmword ptr [r15 + rax], ymm0 add rax, 32 add rcx, 64 cmp rcx, r14 jbe .LBB0_3 sub r14, rax jbe .LBB0_6 add rbx, rax add r15, rax mov r12, rsp mov r13, qword ptr [rip + memcpy@GOTPCREL] mov rdi, r12 mov rsi, rbx mov rdx, r14 vzeroupper call r13 mov rdi, rbx mov rsi, r15 mov rdx, r14 call r13 mov rdi, r15 mov rsi, r12 mov rdx, r14 call r13 .LBB0_6: add rsp, 32 pop rbx pop r12 pop r13 pop r14 pop r15 vzeroupper ret ``` ## After (from my machine) Note no `rsp` manipulation, sorry for different ASM syntax ```x86 swap_m256_slice: cmpq %r9, %rdx jne .LBB1_6 testq %rdx, %rdx je .LBB1_6 cmpq $1, %rdx jne .LBB1_7 xorl %r10d, %r10d jmp .LBB1_4 .LBB1_7: movq %rdx, %r9 andq $-2, %r9 movl $32, %eax xorl %r10d, %r10d .p2align 4, 0x90 .LBB1_8: vmovaps -32(%rcx,%rax), %ymm0 vmovaps -32(%r8,%rax), %ymm1 vmovaps %ymm1, -32(%rcx,%rax) vmovaps %ymm0, -32(%r8,%rax) vmovaps (%rcx,%rax), %ymm0 vmovaps (%r8,%rax), %ymm1 vmovaps %ymm1, (%rcx,%rax) vmovaps %ymm0, (%r8,%rax) addq $2, %r10 addq $64, %rax cmpq %r10, %r9 jne .LBB1_8 .LBB1_4: testb $1, %dl je .LBB1_6 shlq $5, %r10 vmovaps (%rcx,%r10), %ymm0 vmovaps (%r8,%r10), %ymm1 vmovaps %ymm1, (%rcx,%r10) vmovaps %ymm0, (%r8,%r10) .LBB1_6: vzeroupper retq ``` </details> This does all its copying operations as either the original type or as `MaybeUninit`s, so as far as I know there should be no potential abstract machine issues with reading padding bytes as integers. <details> <summary>Perf is essentially unchanged</summary> Though perhaps with more target features this would help more, if it could pick bigger chunks ## Before ``` running 10 tests test slice::swap_with_slice_4x_usize_30 ... bench: 894 ns/iter (+/- 11) test slice::swap_with_slice_4x_usize_3000 ... bench: 99,476 ns/iter (+/- 2,784) test slice::swap_with_slice_5x_usize_30 ... bench: 1,257 ns/iter (+/- 7) test slice::swap_with_slice_5x_usize_3000 ... bench: 139,922 ns/iter (+/- 959) test slice::swap_with_slice_rgb_30 ... bench: 328 ns/iter (+/- 27) test slice::swap_with_slice_rgb_3000 ... bench: 16,215 ns/iter (+/- 176) test slice::swap_with_slice_u8_30 ... bench: 312 ns/iter (+/- 9) test slice::swap_with_slice_u8_3000 ... bench: 5,401 ns/iter (+/- 123) test slice::swap_with_slice_usize_30 ... bench: 368 ns/iter (+/- 3) test slice::swap_with_slice_usize_3000 ... bench: 28,472 ns/iter (+/- 3,913) ``` ## After ``` running 10 tests test slice::swap_with_slice_4x_usize_30 ... bench: 868 ns/iter (+/- 36) test slice::swap_with_slice_4x_usize_3000 ... bench: 99,642 ns/iter (+/- 1,507) test slice::swap_with_slice_5x_usize_30 ... bench: 1,194 ns/iter (+/- 11) test slice::swap_with_slice_5x_usize_3000 ... bench: 139,761 ns/iter (+/- 5,018) test slice::swap_with_slice_rgb_30 ... bench: 324 ns/iter (+/- 6) test slice::swap_with_slice_rgb_3000 ... bench: 15,962 ns/iter (+/- 287) test slice::swap_with_slice_u8_30 ... bench: 281 ns/iter (+/- 5) test slice::swap_with_slice_u8_3000 ... bench: 5,324 ns/iter (+/- 40) test slice::swap_with_slice_usize_30 ... bench: 275 ns/iter (+/- 5) test slice::swap_with_slice_usize_3000 ... bench: 28,277 ns/iter (+/- 277) ``` </detail>

Rollup of 9 pull requests Successful merges: - rust-lang#91795 (resolve/metadata: Stop encoding macros as reexports) - rust-lang#93714 (better ObligationCause for normalization errors in `can_type_implement_copy`) - rust-lang#94175 (Improve `--check-cfg` implementation) - rust-lang#94212 (Stop manually SIMDing in `swap_nonoverlapping`) - rust-lang#94242 (properly handle fat pointers to uninhabitable types) - rust-lang#94308 (Normalize main return type during mono item collection & codegen) - rust-lang#94315 (update auto trait lint for `PhantomData`) - rust-lang#94316 (Improve string literal unescaping) - rust-lang#94327 (Avoid emitting full macro body into JSON errors) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

RalfJung · 2022-02-25T18:18:04Z

Something odd is happening with this PR: Miri started complaining about incorrect use of uninit data in the test harness, and reverting this PR fixes that.

error: Undefined Behavior: type validation failed at .value.1.desc.name.<enum-tag>: encountered uninitialized bytes, but expected a valid enum tag
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:654:9
    |
654 |         tmp.assume_init()
    |         ^^^^^^^^^^^^^^^^^ type validation failed at .value.1.desc.name.<enum-tag>: encountered uninitialized bytes, but expected a valid enum tag
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::ptr::read::<(test::test::TestId, test::test::TestDescAndFn)>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:654:9
    = note: inside `std::vec::Vec::<(test::test::TestId, test::test::TestDescAndFn)>::pop` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1761:22
    = note: inside `test::test::run_tests::<[closure@test::test::run_tests_console::{closure#2}]>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/lib.rs:301:30
    = note: inside `test::test::run_tests_console` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/console.rs:286:5
    = note: inside `test::test::test_main` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/lib.rs:116:15
    = note: inside `test::test::test_main_static` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/lib.rs:135:5
    = note: inside `main`

RalfJung · 2022-02-25T18:23:40Z

Ah, I think this is an instance of #69488 -- not a bug in this code, but a limitation in the Miri engine that is exposed by the new swap implementation.

We could add a cfg(miri) work-around, but the same issue also affects CTFE execution and that work-around won't help there.

scottmcm · 2022-02-25T18:45:52Z

Thanks for looking at this, Ralf. I'm glad to hear that copying as MaybeUninit<usize> is theoretically ok, at least.

I suppose another option would be to skip this entirely with const_eval_select? That'd probably be faster, since there's no autovectorization at MIR-level, nor is there a worry about copying big things to allocas in MIRI.

For a quick thing, I could make a PR to add miri to the cfg here, if you'd like?

rust/library/core/src/mem/mod.rs

Line 711 in 6cbc6c3

#[cfg(not(target_arch = "spirv"))]

RalfJung · 2022-02-25T19:21:25Z

I'm glad to hear that copying as MaybeUninit is theoretically ok, at least.

That is the intention.
I can't vouch for LLVM though, sadly poison in LLVM is per-value, not per-byte, so a MaybeUninit<usize> (which becomes i64) loading a partially-poison value will be fully poison. This is, I think, a fundamental limitation of LLVM and one reason why it needs the "byte" type. (But sadly the LLVM community is far from convinced of this.)

For a quick thing, I could make a PR to add miri to the cfg here, if you'd like?

How would that help? swap_nonoverlapping seems to still bottom out in swap_simple, which will still cause the same problem.

I think there is a fairly simple fix for #69488, but I don't know its larger consequences. That and the LLVM concerns mentioned above made me hesitate. But it might be the right time to see if that fix works -- that would help both for CTFE and Miri.

RalfJung · 2022-02-25T20:11:41Z

I created a standalone testcase and opened an issue: #94371

…oli-obk For MIRI, cfg out the swap vectorization logic from 94212 Because of rust-lang#69488 the swap logic from rust-lang#94212 doesn't currently work in MIRI. Copying in smaller pieces is probably much worse for its performance anyway, so it'd probably rather just use the simple path regardless. Part of rust-lang#94371, though another PR will be needed for the CTFE aspect. r? `@oli-obk` cc `@RalfJung`

…i-obk For MIRI, cfg out the swap vectorization logic from 94212 Because of rust-lang#69488 the swap logic from rust-lang#94212 doesn't currently work in MIRI. Copying in smaller pieces is probably much worse for its performance anyway, so it'd probably rather just use the simple path regardless. Part of rust-lang#94371, though another PR will be needed for the CTFE aspect. r? `@oli-obk` cc `@RalfJung`

ptr::copy and ptr::swap are doing untyped copies The consensus in rust-lang#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`"). The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after `@scottmcm's` [recent changes](rust-lang#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.) On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP. Fixes rust-lang#63159

ptr::copy and ptr::swap are doing untyped copies The consensus in rust-lang#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`"). The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after ``@scottmcm's`` [recent changes](rust-lang#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.) On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP. Fixes rust-lang#63159

ptr::copy and ptr::swap are doing untyped copies The consensus in rust-lang/rust#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`"). The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after ``@scottmcm's`` [recent changes](rust-lang/rust#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.) On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP. Fixes rust-lang/rust#63159

scottmcm added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Feb 21, 2022

rust-highfive assigned dtolnay Feb 21, 2022

scottmcm commented Feb 21, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

scottmcm force-pushed the swapper branch from b7128e2 to 8ca47d7 Compare February 21, 2022 08:54

dtolnay approved these changes Feb 22, 2022

View reviewed changes

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Feb 22, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 24, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 24, 2022

Dylan-DPC mentioned this pull request Feb 24, 2022

Rollup of 9 pull requests #94331

Closed

This was referenced Feb 24, 2022

Rollup of 9 pull requests #94332

Closed

Rollup of 9 pull requests #94333

Merged

bors merged commit 7fb55b4 into rust-lang:master Feb 25, 2022

rustbot added this to the 1.61.0 milestone Feb 25, 2022

scottmcm deleted the swapper branch February 25, 2022 02:57

RalfJung mentioned this pull request Feb 25, 2022

mem::swap behaves incorrectly in CTFE (and Miri) #94371

Closed

scottmcm mentioned this pull request Feb 27, 2022

For MIRI, cfg out the swap vectorization logic from 94212 #94412

Merged

RalfJung mentioned this pull request Feb 27, 2022

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Open

This was referenced May 13, 2022

Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? #63159

Closed

const-eval: loading or overwriting parts of a pointer is not supported #87184

Closed

This was referenced Jun 3, 2022

Tracking Issue for const_swap #83163

Open

ptr::copy and ptr::swap are doing untyped copies #97712

Merged

DaniPopes mentioned this pull request Apr 21, 2024

-C linker-plugin-lto prevents vectorization of mem::swap at -C opt-level=3 #124234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop manually SIMDing in `swap_nonoverlapping` #94212

Stop manually SIMDing in `swap_nonoverlapping` #94212

scottmcm commented Feb 21, 2022 •

edited

Loading

scottmcm commented Feb 21, 2022

scottmcm Feb 21, 2022

This comment has been minimized.

dtolnay left a comment

dtolnay commented Feb 22, 2022

bors commented Feb 22, 2022

bors commented Feb 24, 2022

bors commented Feb 24, 2022

scottmcm commented Feb 24, 2022

rust-log-analyzer commented Feb 24, 2022

RalfJung commented Feb 25, 2022

RalfJung commented Feb 25, 2022

scottmcm commented Feb 25, 2022 •

edited

Loading

RalfJung commented Feb 25, 2022

RalfJung commented Feb 25, 2022

Stop manually SIMDing in swap_nonoverlapping #94212

Stop manually SIMDing in swap_nonoverlapping #94212

Conversation

scottmcm commented Feb 21, 2022 • edited Loading

Before (from godbolt)

After (from my machine)

Before

After

scottmcm commented Feb 21, 2022

scottmcm Feb 21, 2022

Choose a reason for hiding this comment

This comment has been minimized.

dtolnay left a comment

Choose a reason for hiding this comment

dtolnay commented Feb 22, 2022

bors commented Feb 22, 2022

bors commented Feb 24, 2022

bors commented Feb 24, 2022

scottmcm commented Feb 24, 2022

rust-log-analyzer commented Feb 24, 2022

RalfJung commented Feb 25, 2022

RalfJung commented Feb 25, 2022

scottmcm commented Feb 25, 2022 • edited Loading

RalfJung commented Feb 25, 2022

RalfJung commented Feb 25, 2022

Stop manually SIMDing in `swap_nonoverlapping` #94212

Stop manually SIMDing in `swap_nonoverlapping` #94212

scottmcm commented Feb 21, 2022 •

edited

Loading

scottmcm commented Feb 25, 2022 •

edited

Loading