-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement C++23 byteswap
#3093
base: main
Are you sure you want to change the base?
implement C++23 byteswap
#3093
Conversation
there are some optimization opportunities, @davebayer would do you like to explore them? in this case, I can add some suggestions |
I wanted to do that in a separate PR, but sure, why not! |
perfect. I added a set of optimization for CUDA a while ago, and it also includes |
Thank you for the hint. I've implemented the optimized versions of 32-bit and 64-bit byte swap. The PTX output is now more or less identical to the In case you are interested, here is the link to godbolt: https://godbolt.org/z/91eceT8so |
/ok to test |
/ok to test |
🟨 CI finished in 2h 07m: Pass: 93%/168 | Total: 3d 04h | Avg: 27m 27s | Max: 1h 15m | Hits: 61%/12600
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 168)
# | Runner |
---|---|
124 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
/ok to test |
🟨 CI finished in 1h 38m: Pass: 98%/168 | Total: 1d 21h | Avg: 16m 05s | Max: 1h 11m | Hits: 48%/22354
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 168)
# | Runner |
---|---|
124 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
This PR introduces C++23
std::byteswap
to CCCL and makes it available back in C++11.The implementation uses compiler intrinsics
__builtin_bswap
if available.