Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39979: [Python] Low-level bindings for exporting/importing the C Device Interface #39980

Merged
merged 11 commits into from
Feb 28, 2024

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Feb 7, 2024

Rationale for this change

We have low-level methods _import_from_c/_export_to_c for the C Data Interface, we can add similar methods for the C Device data interface.

Expanding the Arrow PyCapsule protocol (i.e. a better public API for other libraries) is covered by #38325. Because of that, we might not want to keep those low-level methods long term (or at least we need to have the equivalents using capsules), but for testing it's useful to already add those.

What changes are included in this PR?

Added methods to Array and RecordBatch classes. Currently import only works for CPU devices.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review February 8, 2024 15:19
Comment on lines 1983 to 1985
if (device_type != ARROW_DEVICE_CPU) {
return Status::NotImplemented("Only importing data on CPU is supported");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could later be expanded to also allow CUDA device for CUDA enabled builds

Copy link
Member

@pitrou pitrou Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there probably should be some kind of registry so that "default" device mappers can be added separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou you mention a "registry", but AFAIK that's what we ideally would have (so external device implementations could register themselves) and that doesn't exist yet, right?
In that case, is the function above an OK short-term default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 8, 2024
Comment on lines 1989 to 1992
Result<std::shared_ptr<Array>> ImportDeviceArray(struct ArrowDeviceArray* array,
std::shared_ptr<DataType> type) {
return ImportDeviceArray(array, type, DefaultDeviceMapper);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to provide such an API that uses a default DeviceMapper?

With the current APIs here, I assume the idea is that it's the responsibility of the user (i.e. the library or application using Arrow C++ to consume data through the C Device interface) to provide the device mapping as they see fit.
In the case of exposing this in pyarrow, it's pyarrow that is the user of those APIs and I think pyarrow certainly wants to have a default mapping provided (not to be specified by the user of pyarrow). In theory I could write this DefaultDeviceMapper function in cython to keep this on the pyarrow side, but this might also be useful for other users of the C++ APIs?

(I suppose when we add a default in C++, I could also give the existing signatures a default parameter value for mapper, instead of adding those two additional signatures)

cc @zeroshade @pitrou

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to provide such an API that uses a default DeviceMapper?

Yes, that sounds reasonable to me. I think that in many (most?) cases, users will want to use whatever device mapper is registered for the given device type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also:

I could also give the existing signatures a default parameter value for mapper, instead of adding those two additional signatures

Yes, that would reduce the proliferation of different functions. You could simply have something like const DeviceMemoryMapper& mapper = {}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difficulty with providing a default device mapper here is that it created a circular dependency due to the ArrowDeviceType being defined in abi.h and required linking against libarrow_cuda.so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a reason to keep this default on the pyarrow side? (we can implement the mapper function in C++, but only provide it as the default argument on the Python side)

In Python, we can more easily dynamically check if pyarrow.cuda module is available, and if so provide a different default mapper (that includes GPU devices).

Comment on lines 674 to 677
# verify exported struct
assert c_array.device_type == 1 # ARROW_DEVICE_CPU 1
assert c_array.device_id == -1
assert c_array.array.length == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a test that uses the arrow cuda lib and verify the device etc.?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to add actual cuda tests later in a separate PR (with proper roundtrip tests, not just export, but roundtrip doesn't work yet for non-cpu right now)

@@ -64,6 +64,16 @@
// Opaque producer-specific data
void* private_data;
};

typedef int32_t ArrowDeviceType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we expose the constants in pyarrow somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use them ourselves, I don't know if that is needed (although it might still be useful for other users of pyarrow.cffi?)

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 14, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 20, 2024
Comment on lines +1834 to +1837
c_array = GetResultValue(
ImportDeviceArray(<ArrowDeviceArray*> c_ptr,
<ArrowSchema*> c_type_ptr)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default mapper is only allowing CPU arrays, but pyarrow does have a cuda lib, shouldn't we allow and enable importing at least CUDA arrays too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, but as mentioned earlier (#39980 (comment)), I was planning to tackle CUDA in a follow-up, and this PR indeed only properly supports and tests CPU.

Comment on lines +1795 to +1797
Be careful: if you don't pass the ArrowDeviceArray struct to a consumer,
array memory will leak. This is a low-level function intended for
expert users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this explicitly mention the release callback on the struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from the existing docstrings. We could mention the release callback explicitly, but essentially then you are a "consumer". This functions returns an integer, you can't call the release callback on the return value as such. Only when you actually interpret it as an ArrowArray struct, you can do that (and at that point, you are a consumer who should be aware of those details?)
I could also point to the general page about the C Data Interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche that the release callback need not be mentioned here. This is all in the spec.

Comment on lines +3202 to +3203
c_batch = GetResultValue(ImportDeviceRecordBatch(
<ArrowDeviceArray*> c_ptr, <ArrowSchema*> c_schema_ptr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before, don't we want to allow using the pyarrow.cuda lib to provide a device mapper and hallow handling cuda-based gpu memory arrays?

@jorisvandenbossche
Copy link
Member Author

@zeroshade @pitrou are you OK with merging this PR with only CPU support for now, and leave CUDA integration for a separate follow-up PR?

@pitrou
Copy link
Member

pitrou commented Feb 22, 2024

@zeroshade @pitrou are you OK with merging this PR with only CPU support for now, and leave CUDA integration for a separate follow-up PR?

That sounds ok to me.

Comment on lines +1795 to +1797
Be careful: if you don't pass the ArrowDeviceArray struct to a consumer,
array memory will leak. This is a low-level function intended for
expert users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche that the release callback need not be mentioned here. This is all in the spec.

@@ -1778,6 +1778,70 @@ cdef class Array(_PandasConvertible):

return pyarrow_wrap_array(array)

def _export_to_c_device(self, out_ptr, out_schema_ptr=0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out_schema_ptr=None would feel slightly more Pythonic IMHO, though that's debatable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to leave this as is, to keep it consistent with the other _export_to_c definitions (and the _as_c_pointer helper also requires an integer at the moment)



@needs_cffi
def test_export_import_device_array():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're copy-pasting a lot of code in those tests, can we try to reduce duplication by factoring common functionality out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an attempt to refactor this. In any case it's adding less code now ;)

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Feb 28, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 28, 2024
@jorisvandenbossche
Copy link
Member Author

Appveyor seems to be an actual related failure now.

lib.cpp.obj : error LNK2019: unresolved external symbol "class arrow::Result<class std::shared_ptr<class arrow::MemoryManager> > __cdecl arrow::DefaultDeviceMapper(int,__int64)" (?DefaultDeviceMapper@arrow@@YA?AV?$Result@V?$shared_ptr@VMemoryManager@arrow@@@std@@@1@H_J@Z) referenced in function "struct _object * __cdecl __pyx_pf_7pyarrow_3lib_11RecordBatch_48_import_from_c_device(struct _object *,struct _object *)" (?__pyx_pf_7pyarrow_3lib_11Re

So it complains about not knowing DefaultDeviceMapper ( which I added as default argument value on the C++ side, but didn't add this to the signature on the cython side). @pitrou do you know of such issue with windows that you can't leave out arguments with defaults on the cython side?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 28, 2024
@pitrou pitrou merged commit 99c5412 into apache:main Feb 28, 2024
34 of 35 checks passed
@pitrou pitrou removed the awaiting changes Awaiting changes label Feb 28, 2024
@jorisvandenbossche jorisvandenbossche deleted the test-pyarrow-device-interface branch February 28, 2024 13:50
@jorisvandenbossche
Copy link
Member Author

Thanks for the reviews!

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…he C Device Interface (apache#39980)

### Rationale for this change

We have low-level methods `_import_from_c`/`_export_to_c` for the C Data Interface, we can add similar methods for the C Device data interface.

Expanding the Arrow PyCapsule protocol (i.e. a better public API for other libraries) is covered by apache#38325. Because of that, we might not want to keep those low-level methods long term (or at least we need to have the equivalents using capsules), but for testing it's useful to already add those.

### What changes are included in this PR?

Added methods to Array and RecordBatch classes. Currently import only works for CPU devices.

* GitHub Issue: apache#39979

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 99c5412.

There were 2 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
…he C Device Interface (apache#39980)

### Rationale for this change

We have low-level methods `_import_from_c`/`_export_to_c` for the C Data Interface, we can add similar methods for the C Device data interface.

Expanding the Arrow PyCapsule protocol (i.e. a better public API for other libraries) is covered by apache#38325. Because of that, we might not want to keep those low-level methods long term (or at least we need to have the equivalents using capsules), but for testing it's useful to already add those.

### What changes are included in this PR?

Added methods to Array and RecordBatch classes. Currently import only works for CPU devices.

* GitHub Issue: apache#39979

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
jorisvandenbossche added a commit that referenced this pull request Mar 27, 2024
…ryManager in C Device Data import (#40699)

### Rationale for this change

Follow-up on #39980 (comment)

Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

### What changes are included in this PR?

This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`).

Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered).

### Are these changes tested?

### Are there any user-facing changes?

* GitHub Issue: #40698

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…o MemoryManager in C Device Data import (apache#40699)

### Rationale for this change

Follow-up on apache#39980 (comment)

Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

### What changes are included in this PR?

This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`).

Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered).

### Are these changes tested?

### Are there any user-facing changes?

* GitHub Issue: apache#40698

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants