GH-40698: [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import #40699

jorisvandenbossche · 2024-03-21T10:02:41Z

Rationale for this change

Right now, the user of ImportDeviceArray or ImportDeviceRecordBatch needs to provide a DeviceMemoryMapper mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in arrow::cuda, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

What changes are included in this PR?

This PR adds two new public functions to register device types (RegisterDeviceMemoryManager) and retrieve the mapper from the registry (GetDeviceMemoryManager).

Further, it provides a RegisterCUDADevice to optionally register the CUDA devices (by default only CPU device is registered).

Are these changes tested?

Are there any user-facing changes?

GitHub Issue: [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import #40698

…o MemoryManager in C Device Data import

github-actions · 2024-03-21T10:03:08Z

⚠️ GitHub issue #40698 has been automatically assigned in GitHub to PR creator.

jorisvandenbossche · 2024-03-21T10:04:07Z

@pitrou I would appreciate a preliminary review to check if this is going in the right direction

(of course still need to add tests, docs, clean-up naming, etc, and testing it now with CUDA)

For now I didn't go for a dual public / Impl class structure like we do for other registries, because it seems that the class itself doesn't need to be public in this case. Just the register / get function should be sufficient for users?

And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc

jorisvandenbossche · 2024-03-21T13:04:06Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-03-21T13:06:44Z

Revision: e16e24d

Submitted crossbow builds: ursacomputing/crossbow @ actions-3c9b11581b

Task	Status
test-cuda-python

pitrou · 2024-03-21T14:50:32Z

@pitrou I would appreciate a preliminary review to check if this is going in the right direction

Yes, this is looking ok, though the implementation can be simplified a bit.

For now I didn't go for a dual public / Impl class structure like we do for other registries, because it seems that the class itself doesn't need to be public in this case. Just the register / get function should be sufficient for users?

Agreed. No need to expose the registry class itself for now.

And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc

Since the API is minimal and doesn't require any addition includes, we can keep it in device.h IMHO.

jorisvandenbossche · 2024-03-21T16:46:28Z

though the implementation can be simplified a bit

You mean to not even use the internal class to store the mapping, but just have the register/get functions and store the unordered_map in a global variable?

pitrou · 2024-03-21T16:51:11Z

No, the class is ok, but the call_once is not required if instead using a static local variable.

jorisvandenbossche · 2024-03-22T08:51:35Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-03-22T08:53:52Z

Revision: f33872d

Submitted crossbow builds: ursacomputing/crossbow @ actions-2f2ec1b2f6

Task	Status
test-cuda-python

zeroshade · 2024-03-25T15:06:38Z

cpp/src/arrow/gpu/cuda_memory.cc

+Result<std::shared_ptr<MemoryManager>> DefaultGPUMemoryMapper(int64_t device_id) {
+  ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id));
+  return device->default_memory_manager();
+}


We should probably include some sort of customizations on the cuda device to ensure it uses the appropriate allocation type (HOST/MANAGED/etc).

I somewhat blindly copied this from an existing function:

arrow/cpp/src/arrow/gpu/cuda_memory.cc

Lines 488 to 497 in 1781b32

Result<std::shared_ptr<MemoryManager>> DefaultMemoryMapper(ArrowDeviceType device_type,

int64_t device_id) {

switch (device_type) {

case ARROW_DEVICE_CPU:

return default_cpu_memory_manager();

case ARROW_DEVICE_CUDA:

case ARROW_DEVICE_CUDA_HOST:

case ARROW_DEVICE_CUDA_MANAGED: {

ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id));

return device->default_memory_manager();

But yes, that just ignores the device type at the moment. Looking at the code, it seems that CudaDevice currently only supports CUDA allocation type?

Then for this PR I can just remove the other two?

That seems reasonable for now. We can update and handle the other types in a future update

pitrou · 2024-03-25T16:31:45Z

cpp/src/arrow/gpu/cuda_memory.cc

+
+std::once_flag cuda_registered;
+
+Status RegisterCUDADevice() {


Is there a reason for not doing this automatically at initialization rather than have the user call it explicitly?

To be honest, I am not fully sure on what possible use cases would be. So from my point of view of enabling importing CUDA data in pyarrow, registering the CUDA device automatically is perfectly fine.
I assume it's quite unlikely that someone might want to register a different CUDA device from C++?

Well, if they have a need to a dedicated CUDA mapper, they can just pass their own DeviceMemoryMapper when importing, AFAICT. What do you think @zeroshade ?

That's true. I was going to suggest that with this registration mechanism, we don't necessarily need to keep the device mapper keyword, but that's actually a reason to keep it

I pushed a commit that registers the CUDA device by default and therefore removes the public RegisterCudaDevice function.

cpp/src/arrow/gpu/cuda_memory.cc

cpp/src/arrow/device.cc

cpp/src/arrow/device.h

cpp/src/arrow/c/bridge_test.cc

Co-authored-by: Antoine Pitrou <[email protected]>

jorisvandenbossche · 2024-03-26T12:08:38Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-03-26T12:10:54Z

Revision: 92ece26

Submitted crossbow builds: ursacomputing/crossbow @ actions-a8e191b52d

Task	Status
test-cuda-python

jorisvandenbossche · 2024-03-26T16:00:39Z

This should be ready for another (final?) review

pitrou

LGTM, but one minor suggestion still. Feel free to merge when done!

pitrou · 2024-03-26T16:55:52Z

cpp/src/arrow/device.h

@@ -363,4 +363,33 @@ class ARROW_EXPORT CPUMemoryManager : public MemoryManager {
 ARROW_EXPORT
 std::shared_ptr<MemoryManager> default_cpu_memory_manager();

+using MemoryMapper =
+    std::function<Result<std::shared_ptr<MemoryManager>>(int64_t device_id)>;


Sorry, but a couple more suggestions to unify naming:

rename MemoryMapper to DeviceMemoryMapper?

rename RegisterDeviceMemoryManager to RegisterDeviceMemoryMapper

rename GetDeviceMemoryManager to GetDeviceMemoryMapper

Good points, that naming is definitely more consistent.

There is however one problem that we already define a DeviceMemoryMapper for the keyword type in the actual bridge.h Import methods:

arrow/cpp/src/arrow/c/bridge.h

Lines 218 to 219 in 434f872

using DeviceMemoryMapper =

std::function<Result<std::shared_ptr<MemoryManager>>(ArrowDeviceType, int64_t)>;

and we should probably find a distinct name, given that both are slight different (the one takes device_type+device_id and returns a MemoryManager, while the other is a function already tied to a specific device_type and thus only takes a device_id, returning again a MemoryManager)

It's of course a subtle difference that might be difficult to embody in a name. But at least using distinct names seems best.

Perhaps DeviceIdMapper then? Not terribly pretty I admit...

I think that sounds good for the function type alias, but then I would personally leave the register/get functions as is? I would find RegisterDeviceIdMapper a bit strange with the focus on the id, because you are also registering a device type, it's just that the value you store for the registered type is the DeviceIdMapper ..

Anyway, in the end it doesn't matter that much, happy to go with whatever we come up with.

Or DeviceMapper / RegisterDeviceMapper / GetDeviceMapper ? (that's a bit more generic, but keeps the three consistent with each other)

pitrou · 2024-03-26T16:58:08Z

@github-actions crossbow submit test-cuda-cpp

github-actions · 2024-03-26T17:00:32Z

Revision: 7a9e30d

Submitted crossbow builds: ursacomputing/crossbow @ actions-b7970e2559

Task	Status
test-cuda-cpp

jorisvandenbossche · 2024-03-27T08:42:33Z

@github-actions crossbow submit test-cuda-cpp

github-actions · 2024-03-27T08:44:55Z

Revision: a5a6f6c

Submitted crossbow builds: ursacomputing/crossbow @ actions-f7506cdb39

Task	Status
test-cuda-cpp

jorisvandenbossche · 2024-03-27T11:58:01Z

Thanks for the reviews!

conbench-apache-arrow · 2024-03-27T19:17:20Z

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a407a6b.

There were 10 benchmark results indicating a performance regression:

Commit Run on ursa-i9-9960x at 2024-03-27 14:29:12Z
- dataset-selectivity (Python) with dataset=nyctaxi_multi_ipc_s3, selectivity=10%
- dataset-selectivity (Python) with dataset=nyctaxi_multi_ipc_s3, selectivity=100%
and 8 more (see the report linked below)

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

…o MemoryManager in C Device Data import (apache#40699) ### Rationale for this change Follow-up on apache#39980 (comment) Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function) To make this easier, this PR adds a registry such that default device mappers can be added separately. ### What changes are included in this PR? This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`). Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered). ### Are these changes tested? ### Are there any user-facing changes? * GitHub Issue: apache#40698 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>

apacheGH-40698: [C++] Create registry for Devices to map DeviceType t…

4c1183c

…o MemoryManager in C Device Data import

github-actions bot added Component: C++ awaiting committer review Awaiting committer review labels Mar 21, 2024

jorisvandenbossche added 2 commits March 21, 2024 11:48

test with CUDA

9972eed

some clean-up and docs

e16e24d

jorisvandenbossche added 2 commits March 21, 2024 21:58

use static local variable instead of call_once

337e65f

some more docs + basic test for failure cases

f33872d

jorisvandenbossche marked this pull request as ready for review March 22, 2024 08:51

jorisvandenbossche requested review from pitrou and zeroshade March 22, 2024 08:52

zeroshade reviewed Mar 25, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 25, 2024