Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40698: [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import #40699

Merged
merged 15 commits into from
Mar 27, 2024

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Mar 21, 2024

Rationale for this change

Follow-up on #39980 (comment)

Right now, the user of ImportDeviceArray or ImportDeviceRecordBatch needs to provide a DeviceMemoryMapper mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in arrow::cuda, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

What changes are included in this PR?

This PR adds two new public functions to register device types (RegisterDeviceMemoryManager) and retrieve the mapper from the registry (GetDeviceMemoryManager).

Further, it provides a RegisterCUDADevice to optionally register the CUDA devices (by default only CPU device is registered).

Are these changes tested?

Are there any user-facing changes?

Copy link

⚠️ GitHub issue #40698 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Mar 21, 2024

@pitrou I would appreciate a preliminary review to check if this is going in the right direction

(of course still need to add tests, docs, clean-up naming, etc, and testing it now with CUDA)

For now I didn't go for a dual public / Impl class structure like we do for other registries, because it seems that the class itself doesn't need to be public in this case. Just the register / get function should be sufficient for users?

And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: e16e24d

Submitted crossbow builds: ursacomputing/crossbow @ actions-3c9b11581b

Task Status
test-cuda-python GitHub Actions

@pitrou
Copy link
Member

pitrou commented Mar 21, 2024

@pitrou I would appreciate a preliminary review to check if this is going in the right direction

Yes, this is looking ok, though the implementation can be simplified a bit.

For now I didn't go for a dual public / Impl class structure like we do for other registries, because it seems that the class itself doesn't need to be public in this case. Just the register / get function should be sufficient for users?

Agreed. No need to expose the registry class itself for now.

And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc

Since the API is minimal and doesn't require any addition includes, we can keep it in device.h IMHO.

@jorisvandenbossche
Copy link
Member Author

though the implementation can be simplified a bit

You mean to not even use the internal class to store the mapping, but just have the register/get functions and store the unordered_map in a global variable?

@pitrou
Copy link
Member

pitrou commented Mar 21, 2024

No, the class is ok, but the call_once is not required if instead using a static local variable.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review March 22, 2024 08:51
@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: f33872d

Submitted crossbow builds: ursacomputing/crossbow @ actions-2f2ec1b2f6

Task Status
test-cuda-python GitHub Actions

Comment on lines 505 to 508
Result<std::shared_ptr<MemoryManager>> DefaultGPUMemoryMapper(int64_t device_id) {
ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id));
return device->default_memory_manager();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably include some sort of customizations on the cuda device to ensure it uses the appropriate allocation type (HOST/MANAGED/etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat blindly copied this from an existing function:

Result<std::shared_ptr<MemoryManager>> DefaultMemoryMapper(ArrowDeviceType device_type,
int64_t device_id) {
switch (device_type) {
case ARROW_DEVICE_CPU:
return default_cpu_memory_manager();
case ARROW_DEVICE_CUDA:
case ARROW_DEVICE_CUDA_HOST:
case ARROW_DEVICE_CUDA_MANAGED: {
ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id));
return device->default_memory_manager();

But yes, that just ignores the device type at the moment. Looking at the code, it seems that CudaDevice currently only supports CUDA allocation type?

Then for this PR I can just remove the other two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable for now. We can update and handle the other types in a future update

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 25, 2024

std::once_flag cuda_registered;

Status RegisterCUDADevice() {
Copy link
Member

@pitrou pitrou Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not doing this automatically at initialization rather than have the user call it explicitly?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I am not fully sure on what possible use cases would be. So from my point of view of enabling importing CUDA data in pyarrow, registering the CUDA device automatically is perfectly fine.
I assume it's quite unlikely that someone might want to register a different CUDA device from C++?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if they have a need to a dedicated CUDA mapper, they can just pass their own DeviceMemoryMapper when importing, AFAICT. What do you think @zeroshade ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I was going to suggest that with this registration mechanism, we don't necessarily need to keep the device mapper keyword, but that's actually a reason to keep it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that registers the CUDA device by default and therefore removes the public RegisterCudaDevice function.

cpp/src/arrow/device.cc Outdated Show resolved Hide resolved
cpp/src/arrow/device.cc Outdated Show resolved Hide resolved
cpp/src/arrow/device.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 26, 2024
@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: 92ece26

Submitted crossbow builds: ursacomputing/crossbow @ actions-a8e191b52d

Task Status
test-cuda-python GitHub Actions

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 26, 2024
@jorisvandenbossche
Copy link
Member Author

This should be ready for another (final?) review

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but one minor suggestion still. Feel free to merge when done!

@@ -363,4 +363,33 @@ class ARROW_EXPORT CPUMemoryManager : public MemoryManager {
ARROW_EXPORT
std::shared_ptr<MemoryManager> default_cpu_memory_manager();

using MemoryMapper =
std::function<Result<std::shared_ptr<MemoryManager>>(int64_t device_id)>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but a couple more suggestions to unify naming:

  1. rename MemoryMapper to DeviceMemoryMapper?
  2. rename RegisterDeviceMemoryManager to RegisterDeviceMemoryMapper
  3. rename GetDeviceMemoryManager to GetDeviceMemoryMapper

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, that naming is definitely more consistent.

There is however one problem that we already define a DeviceMemoryMapper for the keyword type in the actual bridge.h Import methods:

using DeviceMemoryMapper =
std::function<Result<std::shared_ptr<MemoryManager>>(ArrowDeviceType, int64_t)>;

and we should probably find a distinct name, given that both are slight different (the one takes device_type+device_id and returns a MemoryManager, while the other is a function already tied to a specific device_type and thus only takes a device_id, returning again a MemoryManager)

It's of course a subtle difference that might be difficult to embody in a name. But at least using distinct names seems best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps DeviceIdMapper then? Not terribly pretty I admit...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds good for the function type alias, but then I would personally leave the register/get functions as is? I would find RegisterDeviceIdMapper a bit strange with the focus on the id, because you are also registering a device type, it's just that the value you store for the registered type is the DeviceIdMapper ..

Anyway, in the end it doesn't matter that much, happy to go with whatever we come up with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or DeviceMapper / RegisterDeviceMapper / GetDeviceMapper ? (that's a bit more generic, but keeps the three consistent with each other)

@pitrou
Copy link
Member

pitrou commented Mar 26, 2024

@github-actions crossbow submit test-cuda-cpp

Copy link

Revision: 7a9e30d

Submitted crossbow builds: ursacomputing/crossbow @ actions-b7970e2559

Task Status
test-cuda-cpp GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 26, 2024
@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-cpp

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 27, 2024
Copy link

Revision: a5a6f6c

Submitted crossbow builds: ursacomputing/crossbow @ actions-f7506cdb39

Task Status
test-cuda-cpp GitHub Actions

@jorisvandenbossche jorisvandenbossche merged commit a407a6b into apache:main Mar 27, 2024
35 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting change review Awaiting change review label Mar 27, 2024
@jorisvandenbossche jorisvandenbossche deleted the device-registry branch March 27, 2024 11:44
@jorisvandenbossche
Copy link
Member Author

Thanks for the reviews!

Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a407a6b.

There were 10 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…o MemoryManager in C Device Data import (apache#40699)

### Rationale for this change

Follow-up on apache#39980 (comment)

Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

### What changes are included in this PR?

This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`).

Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered).

### Are these changes tested?

### Are there any user-facing changes?

* GitHub Issue: apache#40698

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants