-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to expose API to downstream libraries? #16
Comments
Thanks for summarizing the options @saulshanabrook. I can't think of other reasonable options right now. I suspect that the local dispatch based on an Also, this may be worth putting a "tentative" label on for the time being (whatever the outcome of this issue), because numpy/scipy/scikit-learn are still playing with the approaches here. |
One other thing I forgot to mention is the ability to have a standardized way of getting an "array API conformant object" from some existing object. i.e. an If we include this, then it seems like we have three possible things we could possibly document in the spec:
Does anyone have opinions on whether we should include these in the spec. |
Shouldn't we add the option of creating a package that a downstream author uses and must configure at installation time to use a particular backend? This is similar to Global Dispatch but implemented using import hooks rather than contexts. For example, what if
used the flexible import mechanism of Python to intercept the call and register which library is doing the importing and then return the API-compliant module previously registered for that package in the system. Another approach would be to encourage libraries that implement the API to create a namespace that uses it and tell downstream authors to prefer that namespace if they want to ensure API compliance. |
I think the two most sensible options are:
Something like (2) seems needed anyway, because one must indeed be able to get at the module (or module-like object) both as a user choice via regular imports/function calls, as well as from a particular array instance in library code that supports multiple array types.
I'd really like to avoid that, the ergonomics are pretty bad with configuration at install time. Sounds like a recipe for lots of bug reports.
This seems orthogonal to the topic of this issue.
I suspect that that's what most libraries will want to do. See, e.g., |
Ah ok, well we could call it something else? I agree it's a slightly different issue, but related in the question of what "meta" APIs do we provide to allow downstream libraries to at standard array objects and modules. |
Well, I think But I would not mix that in here, array interchange and exposing the whole API as a module are both large discussions, let's do one at a time. |
Also for context: |
One other alternative to But as you said, this conversation is happening outside of this forum. |
This is indeed appealing in its simplicity, and would suffice for many use cases, i.e., code that only uses one array type. It doesn't solve the bigger "multiple library dispatch" problem, but for many projects that isn't so important. Multi-library dispatch could perhaps be added separately with another protocol that determines which array takes priority, and which perhaps could get reused for Python binary arithmetic. |
We talked about this in the meeting today. There was support for at least including local dispatch. And some support for global dispatch, if we included local dispatch as well. Adam brought up a question, that sometimes you might have to implement things differently for performance questions on different libraries. If we get to this point, Ralf said, then we are in a good/better place then where we are today. For now, we could just try different libraries globally and compare the performance results. So I think one next step would be to possibly try to write up what a combination of global and local dispatch could look in the spec. |
I think we formulated a solution for this, in https://data-apis.github.io/array-api/latest/purpose_and_scope.html#how-to-adopt-this-api. |
The answer we arrived on here is that even if there are multiple array types involved, those should come from the same library - in which case there is no dispatch problem. I think we can close this? |
As this has been addressed via |
I wanted to open a discussion on how the Array API (and potentially the dataframe API) will be exposed to downstream libraries.
For example, let's say I am the author of scikit-learn. How do I get access to an "Array compatible API"? Or let's say I am a downstream user, using scikit-learn in a notebook. How can I tell it to use Tensorflow over NumPy?
Options
I present three options here, but I would appreciate any suggestions on further ideas:
Manual
The default option is the current status quo where there is no standard way to get access to some array conformant API backend.
Different downstream libraries, like scikit-learn, could introduce their own mechanisms, like a
backend
kwarg to functions, if they wanted to support different backends.Local Dispatch
Another approach, would be to provide access to the related module from particular instances of the objects, which is the one taken by NEP 37.
In this case, scikit-learn would either call some
x.__array_module__()
method on its inputs or we would provide aarray-api
Python package that would have a helper function likeget_array_module(x)
, similar to the NEP.There is an open PR in scikit-learn (scikit-learn/scikit-learn#16574) to add support for NEP 37.
Global Dispatch
Instead of requiring an object to inspect, we could instead rely on a global context to store the "active array api" and provide ways of getting and settings this. Some form of this is implemented by scipy, with their
scipy.fft.set_backend
, which usesuarray
.This would be heavier weight than we would need, probably, but does illustrate the general concept. I think if we implemented this, we could use Context Variables like python's built in
decimal
module does. i.e. something like this:The advantage of using a global dispatch is then you don't need to rely on passing in some custom instance class to set the backend.
Static Typing
This is slightly tangential, but one question that comes up for me is how we could properly statically type options 2 or 3. It seems like what we need is a
typing.Protocol
but for modules. I raised this as a discussion point on thetyping-sig
mailing list.The text was updated successfully, but these errors were encountered: