-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39979: [Python] Low-level bindings for exporting/importing the C Device Interface #39980
GH-39979: [Python] Low-level bindings for exporting/importing the C Device Interface #39980
Conversation
cpp/src/arrow/c/bridge.cc
Outdated
if (device_type != ARROW_DEVICE_CPU) { | ||
return Status::NotImplemented("Only importing data on CPU is supported"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could later be expanded to also allow CUDA device for CUDA enabled builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there probably should be some kind of registry so that "default" device mappers can be added separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou you mention a "registry", but AFAIK that's what we ideally would have (so external device implementations could register themselves) and that doesn't exist yet, right?
In that case, is the function above an OK short-term default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes!
cpp/src/arrow/c/bridge.cc
Outdated
Result<std::shared_ptr<Array>> ImportDeviceArray(struct ArrowDeviceArray* array, | ||
std::shared_ptr<DataType> type) { | ||
return ImportDeviceArray(array, type, DefaultDeviceMapper); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to provide such an API that uses a default DeviceMapper?
With the current APIs here, I assume the idea is that it's the responsibility of the user (i.e. the library or application using Arrow C++ to consume data through the C Device interface) to provide the device mapping as they see fit.
In the case of exposing this in pyarrow, it's pyarrow that is the user of those APIs and I think pyarrow certainly wants to have a default mapping provided (not to be specified by the user of pyarrow). In theory I could write this DefaultDeviceMapper
function in cython to keep this on the pyarrow side, but this might also be useful for other users of the C++ APIs?
(I suppose when we add a default in C++, I could also give the existing signatures a default parameter value for mapper
, instead of adding those two additional signatures)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to provide such an API that uses a default DeviceMapper?
Yes, that sounds reasonable to me. I think that in many (most?) cases, users will want to use whatever device mapper is registered for the given device type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also:
I could also give the existing signatures a default parameter value for
mapper
, instead of adding those two additional signatures
Yes, that would reduce the proliferation of different functions. You could simply have something like const DeviceMemoryMapper& mapper = {}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty with providing a default device mapper here is that it created a circular dependency due to the ArrowDeviceType
being defined in abi.h
and required linking against libarrow_cuda.so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a reason to keep this default on the pyarrow side? (we can implement the mapper function in C++, but only provide it as the default argument on the Python side)
In Python, we can more easily dynamically check if pyarrow.cuda module is available, and if so provide a different default mapper (that includes GPU devices).
python/pyarrow/tests/test_cffi.py
Outdated
# verify exported struct | ||
assert c_array.device_type == 1 # ARROW_DEVICE_CPU 1 | ||
assert c_array.device_id == -1 | ||
assert c_array.array.length == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a test that uses the arrow cuda lib and verify the device etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to add actual cuda tests later in a separate PR (with proper roundtrip tests, not just export, but roundtrip doesn't work yet for non-cpu right now)
@@ -64,6 +64,16 @@ | |||
// Opaque producer-specific data | |||
void* private_data; | |||
}; | |||
|
|||
typedef int32_t ArrowDeviceType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we expose the constants in pyarrow somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't use them ourselves, I don't know if that is needed (although it might still be useful for other users of pyarrow.cffi
?)
c_array = GetResultValue( | ||
ImportDeviceArray(<ArrowDeviceArray*> c_ptr, | ||
<ArrowSchema*> c_type_ptr) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default mapper is only allowing CPU arrays, but pyarrow does have a cuda lib, shouldn't we allow and enable importing at least CUDA arrays too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, but as mentioned earlier (#39980 (comment)), I was planning to tackle CUDA in a follow-up, and this PR indeed only properly supports and tests CPU.
Be careful: if you don't pass the ArrowDeviceArray struct to a consumer, | ||
array memory will leak. This is a low-level function intended for | ||
expert users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this explicitly mention the release callback on the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from the existing docstrings. We could mention the release callback explicitly, but essentially then you are a "consumer". This functions returns an integer, you can't call the release callback on the return value as such. Only when you actually interpret it as an ArrowArray struct, you can do that (and at that point, you are a consumer who should be aware of those details?)
I could also point to the general page about the C Data Interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jorisvandenbossche that the release callback need not be mentioned here. This is all in the spec.
c_batch = GetResultValue(ImportDeviceRecordBatch( | ||
<ArrowDeviceArray*> c_ptr, <ArrowSchema*> c_schema_ptr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before, don't we want to allow using the pyarrow.cuda lib to provide a device mapper and hallow handling cuda-based gpu memory arrays?
@zeroshade @pitrou are you OK with merging this PR with only CPU support for now, and leave CUDA integration for a separate follow-up PR? |
That sounds ok to me. |
Be careful: if you don't pass the ArrowDeviceArray struct to a consumer, | ||
array memory will leak. This is a low-level function intended for | ||
expert users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jorisvandenbossche that the release callback need not be mentioned here. This is all in the spec.
@@ -1778,6 +1778,70 @@ cdef class Array(_PandasConvertible): | |||
|
|||
return pyarrow_wrap_array(array) | |||
|
|||
def _export_to_c_device(self, out_ptr, out_schema_ptr=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out_schema_ptr=None
would feel slightly more Pythonic IMHO, though that's debatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to leave this as is, to keep it consistent with the other _export_to_c
definitions (and the _as_c_pointer
helper also requires an integer at the moment)
python/pyarrow/tests/test_cffi.py
Outdated
|
||
|
||
@needs_cffi | ||
def test_export_import_device_array(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're copy-pasting a lot of code in those tests, can we try to reduce duplication by factoring common functionality out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an attempt to refactor this. In any case it's adding less code now ;)
Appveyor seems to be an actual related failure now.
So it complains about not knowing |
Thanks for the reviews! |
…he C Device Interface (apache#39980) ### Rationale for this change We have low-level methods `_import_from_c`/`_export_to_c` for the C Data Interface, we can add similar methods for the C Device data interface. Expanding the Arrow PyCapsule protocol (i.e. a better public API for other libraries) is covered by apache#38325. Because of that, we might not want to keep those low-level methods long term (or at least we need to have the equivalents using capsules), but for testing it's useful to already add those. ### What changes are included in this PR? Added methods to Array and RecordBatch classes. Currently import only works for CPU devices. * GitHub Issue: apache#39979 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 99c5412. There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…he C Device Interface (apache#39980) ### Rationale for this change We have low-level methods `_import_from_c`/`_export_to_c` for the C Data Interface, we can add similar methods for the C Device data interface. Expanding the Arrow PyCapsule protocol (i.e. a better public API for other libraries) is covered by apache#38325. Because of that, we might not want to keep those low-level methods long term (or at least we need to have the equivalents using capsules), but for testing it's useful to already add those. ### What changes are included in this PR? Added methods to Array and RecordBatch classes. Currently import only works for CPU devices. * GitHub Issue: apache#39979 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ryManager in C Device Data import (#40699) ### Rationale for this change Follow-up on #39980 (comment) Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function) To make this easier, this PR adds a registry such that default device mappers can be added separately. ### What changes are included in this PR? This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`). Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered). ### Are these changes tested? ### Are there any user-facing changes? * GitHub Issue: #40698 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…o MemoryManager in C Device Data import (apache#40699) ### Rationale for this change Follow-up on apache#39980 (comment) Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function) To make this easier, this PR adds a registry such that default device mappers can be added separately. ### What changes are included in this PR? This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`). Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered). ### Are these changes tested? ### Are there any user-facing changes? * GitHub Issue: apache#40698 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Rationale for this change
We have low-level methods
_import_from_c
/_export_to_c
for the C Data Interface, we can add similar methods for the C Device data interface.Expanding the Arrow PyCapsule protocol (i.e. a better public API for other libraries) is covered by #38325. Because of that, we might not want to keep those low-level methods long term (or at least we need to have the equivalents using capsules), but for testing it's useful to already add those.
What changes are included in this PR?
Added methods to Array and RecordBatch classes. Currently import only works for CPU devices.