-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduces new DeviceReduce::Arg{Min,Max}
interface with two output iterators
#3148
base: main
Are you sure you want to change the base?
Conversation
🟨 CI finished in 2h 16m: Pass: 90%/94 | Total: 1d 10h | Avg: 21m 59s | Max: 57m 04s | Hits: 99%/12384
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 94)
# | Runner |
---|---|
70 | linux-amd64-cpu16 |
11 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
void* d_temp_storage, | ||
size_t& temp_storage_bytes, | ||
InputIteratorT d_in, | ||
ExtremumOutIteratorT d_min_out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal Opinion: I would prefer if we had an overload that just returned an InputIteratorT
I am thinking about the cases where a user would want ExtremumOutIteratorT
to be something different than InputIteratorT
and I am falling short
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They might, for instance, want to transform the min/max result before writing it to memory or maybe even only want the index and discard the extremum value itself.
@@ -67,7 +67,7 @@ struct unzip_and_write_arg_extremum_op | |||
IndexOutIteratorT index_out_it; | |||
|
|||
template <typename IndexT, typename KeyValuePairT> | |||
__device__ void operator()(IndexT, KeyValuePairT reduced_result) | |||
_CCCL_DEVICE _CCCL_FORCEINLINE void operator()(IndexT, KeyValuePairT reduced_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of forceinline, do we need it?
🟨 CI finished in 1h 39m: Pass: 95%/168 | Total: 2d 05h | Avg: 18m 56s | Max: 1h 16m | Hits: 9%/22466
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 168)
# | Runner |
---|---|
124 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
🟩 CI finished in 1h 48m: Pass: 100%/168 | Total: 1d 14h | Avg: 13m 43s | Max: 1h 10m | Hits: 37%/22466
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 168)
# | Runner |
---|---|
124 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
Description
This PR introduces a new overload for the
DeviceReduce::Arg{Min,Max}
interface and deprecates the existing interface. Specifically, the result is now returned to two separate user-provided input iterators, one for the extremum, one for the index instead of returning the (extremum, index)-tuple as acub::KeyValuePair
that has shortcomings (see #3146)Closes #3146
Checklist