-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer generics #3345
Comments
Thank you for interesting proposal and for research! I really like the idea of supporting fixed size arrays/tuples/matrices. I have seen many people already playing with combining mypy and numpy, but in my experience many errors in numpy are array shape mismatches, something like:
or
The point is that in most cases the array shapes/sizes are known statically, so that all these errors can be caught by simple integer dependent types. |
Tuples could easily be special cased in mypy so they aren't enough justification by themselves for this feature. I agree with @ilevkivskyi that numpy could be an interesting use case. The mypy core team currently is not planning to work on numpy support, however -- though I understand that many potential users could find it useful. |
A possible strategy would be so start from a relatively small project (if/when someone will have time for this). For example have some special support for |
As for |
One way to think about multidimensional arrays are as a function from indices to a value. The "shape" really just constrains the possible inputs to the functions. I am playing with implementing the Mathematics of Arrays in Python and I can represent an array object as a tuple of an indexing function and a shape tuple. But this doesn't capture the relationship between these two parts at the type level. If we also want to track the inner types in the arrays, this requires a generic array object. So at the type system, I want to be able to represent an array object of of type |
It looks like this could be accomplished with generic protocols, like this: T = TypeVar('T')
class Array(Protocol[T]):
def index(self, *ix: int) -> T: ...
shape: List[int] An alternative way of doing this is to just use my own type annotations that can't be typed statically and just use them at runtime. This is what the Relay project does, which is part of the TVM ML stack from U of Washington: @relay_model
def lenet(x: Tensor[Float, (1, 28, 28)]) -> Tensor[Float, 10]:
conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1], no_bias=False)
tanh1 = relay.tanh(conv1)
pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1])
conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1], no_bias=False) tanh2 = relay.tanh(conv2)
pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2)
fc1 = relay.linear(flatten, num_hidden=500)
tanh3 = relay.tanh(fc1)
return relay.linear(tanh3, num_hidden=10)
@relay
def loss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) -> Float:
return relay.softmax_cross_entropy(lenet(x), y)
@relay
def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) -> Model:
model = relay.create_model(lenet)
for x, y in data:
model_grad = relay.grad(model, loss, (x, y))
relay.update_model_params(model, model_grad) return relay.export_model(model)
training_data, test_data = relay.datasets.mnist() model = train_lenet(training_data)
print(relay.argmax(model(test_data[0]))) They just disable static type checking for those types. This is a bit of a tangent, but I attended some of PLDI this year and there was a ton of talk there about the Python type system and language. Lots of people were building these computational frameworks in Python for probabilistic programming or deep learning and were coming from other languages, likes Haskell, Rust, or Java. I am not that familiar with the community there, but it seems like it would be good if some of that energy could be focused on research of Python's core so that these types of extensions to the language could be better supported (have reasonable error messages, static type checking, etc). |
OT: please don't use images of text. There's no reason you couldn't have copy-pasted the original code. |
Duly noted. I updated the post with code instead of the image. |
Could you explain a bit more about how this would work? I'm afraid I don't see how generic protocols could handle shape/size constraints at the type level... out-of-bounds checking would still need to be handled at runtime in the |
For Rust the accepted RFC is: Currently a typing problem above could be handled like this:
|
Is a case like this considered as well? def get_ngrams(tokens: Iterable[str], n: int = 2):
"""
>>> get_ngrams(["my", "name", "is", "archibald"], n=2)
[('my', 'name'), ('name', 'is'), ('is', 'archibald')]
"""
grams = zip(*[tokens[i:] for i in range(n)])
return grams PyCharm infers As it is now, I don't think I can do better than |
Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link |
Looking forward to it! |
@fylux Is there a write-up of your proposal available anywhere? |
@luphord I hope eventually some of this stuff will also go to statically typed languages as Rust :-) Rust "const generics" aren't enough. |
@fylux Oh wow, that is a lot more advanced than what I expected, thanks! I was merely thinking of simple use cases such as (in "numpy terminology") import numpy as np
# n, k generic integers
def f(A: np.ndarray[n, k], x: np.ndarray[k]) -> np.ndarray[n]:
return A @ x and then mypy would recognize a type error in A = np.random.random((3,4))
x = np.random.random((5,))
f(A, x) # <- would cause a type error in mypy However, I can clearly see the benefits of type arithmetic that you presented. Is there a road map for your proposal? I understand that at least parts of it are already implemented in pyre? |
Regarding support for "simple cases", that is already possible in Pyre thanks to variadics (e.g. XLM example), and there is currently an open proposal for standardising it in Python (https://mail.python.org/archives/list/[email protected]/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/). About type arithmetic, +, -, //, *, Length and Product are all already supported in Pyre (e.g MaxPool2D example). |
Could a integer generic be considered a generic over literals? E.g. # made up syntax
Int = TypeVar('Int', bound=Literal[int])
class Vector[Int]:
...
Vector32 = Vector[32] It would seem that allowing a generic to be bound to a literal of a type almost avoids introducing new concepts. Everywhere it could be treated as a literal whose exact value wasn't available. Features like "unrolling" tuples, arithmetic, etc, could be implemented in smaller chunks on top of this. |
Regarding the concept of using generics over literal integers as integer generics it is something that we explored in Pyre, but we faced several problems when dealing with values that are unknown. You could always restrict yourself to values that are known at compile time, but the goal is that known and unknown values can interact in a sound way. If for example, you use integer generics for dealing with vectors, you will often not know the size of the vector at compile time. Let's say that you have a function to read a CSV where you know the number of columns but not the number of rows, how would you express that unknown number of rows?
Or similarly, let's say that you want to write the type of a variable calling this function.
In this example, the "?" must be a literal int according to the previous definition of generic over literals. The number of rows is an exact number, but we don´t know which. Thus, we could say that ? is the union of all possible literal integers, something like "AnyInt", "Literal[int]", etc, but then we would be introducing a new concept in the type system. If Python could track unbound type variables and resonate on them, then you could write something like:
Based on my experience, I am currently leaning towards relying on the existing definition of
Actually, Pyre uses as a reference this concept of integer generics for type arithmetic, so that "int" represents unknown information and arithmetic on "int" propagates "int".
Finally, @shaunc, I personally agree with the following "It would seem that allowing a generic to be bound to a literal of a type almost avoids introducing new concepts". Currently, I don´t see a good mechanism for setting bounds or subtypes on specific types of literals (not just ints) so I think that it is something that must be explored. |
To me this looks like the best solution:
|
Generics bounded by literals of types also offers a way to introduce default values and the like: class Base: ...
class Foo(Base):...
Default = TypeVar('Default', bound=Literal[Base])
class BaseUser[Default]:
use(base: Base = Default): ...
defaultBase = Base(size=100)
baseUserWithDefault = BaseUser[defaultBase]()
foo = Foo()
baseUserWithDefault.use(foo) # uses foo
baseUserWithDefault.use() # uses defaultBase WRT arithmetic: this issue (and others similar) seem to have languished, perhaps because they are heavy lifts. I was hoping "generics bounded by literals" would suggest a path forward that would allow functionality to be added incrementally, while still providing a basis for natural extension to more powerful constructs. |
Dependent types have been brought up before in a few places: #366, this python-ideas thread, and most recently, #3062. The latter limits itself to one specific feature of a dependent type system, namely, support for simple literal types.
Another limited feature which could prove helpful is support for integer generics —that is, the ability to parametrize types by integers as well as other types. For example, the length of a homogeneous tuple could be parametrized:
One real-world use-case for this came up in python/typing#221:
Integer genericity could also be used for things like statically checking vector or matrix dimensions:
Note that the Rust community has been discussing this for some time (as well as the more general const generics) and has a lot of material collected on the motivation, challenges, and details of such a feature:
Whether or not it's a good fit for Python is an open question, of course, but I figured I should at least bring it up for discussion! (Especially since it could interact with #3062 in interesting ways.)
(Even nicer would be to support simple type-level expressions so we could do things like this...
...but that may be taking it too far. To avoid running arbitrary user code the expression language would have to be explicitly defined and simple enough to implement in languages other than Python; it may be feasible to support something like +, -, *, //, %, **, max(), min(), and abs(), though.)
The text was updated successfully, but these errors were encountered: