Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

differences / compatibility with attrs project #60

Closed
chadrik opened this issue Nov 1, 2017 · 20 comments
Closed

differences / compatibility with attrs project #60

chadrik opened this issue Nov 1, 2017 · 20 comments

Comments

@chadrik
Copy link

chadrik commented Nov 1, 2017

It would be helpful to have a list of functional differences between dataclasses and attrs, broken down by @dataclass vs @attr.s and field vs attr.ib.

This would be useful and illuminating for a few reasons:


It would make it easier to vet the logic behind, and need for, each of the proposed differences.

@hynek and @Tinche have invested years of thought into the current design: deviating from it without fully understanding the history and reasoning behind each decision might lead to this project needlessly repeating mistakes. I'm glad to see that the attrs devs have already been brought into several issues. My hope is we can get a bird's eye view so that nothing slips through the cracks.


If the differences aren't too great (and ideally they will not be, see above) I'd like to see a dataclass compatibility mode for attrs (e.g. from attrs import dataclass, field).

I'm glad that this badly-needed feature is being worked on, but sadly I'm stuck in python 2 for at least another 2 years, so it's important to me, and surely many attrs-users, to have an easy path to adoption once this becomes part of stdlib.

@chadrik
Copy link
Author

chadrik commented Nov 2, 2017

First off, I found and read #19, which is a good read for anyone wondering whether attrs should be added to the stdlib (spoiler: it should not).

Here is my first attempt at an overview of the differences, starting with function arguments:

attr.attr dataclasses.field
default default or default_factory
validator not present
repr repr
cmp cmp
hash hash
init init
convert not present
metadata not present
type not applicable (uses annotations)
attr.attributes dataclasses.dataclass
these not present
repr_ns not applicable in python 3.x
repr repr
cmp compare, and/or eq
hash hash
init init
slots not present
frozen frozen
str not present

Notes / Observations:

  • the absence of metadata and validator from dataclasses.field are concerning for me. these are pretty crucial to my use of attrs. I could see an argument for convert and validator being merged into a single entity, but I definitely would not want to see them both missing
  • slots were covered in Support __slots__? #28, and the consensus was "punt this down the road. If people want slots they can manually add __slots__ = ('x', 'y', 'z') to their class"
  • cmp vs compare/eq was covered in Implements #46: Specify eq separately from compare, for unorderable types. #48: compare=False, eq=True generates just __eq__ and __ne__ and is used for for "unorderable types". I'm still a little hazy on why this is necessary.
  • default_factory vs default was covered in How to specify factory functions #24. dataclasses splits default_factory from default so that an arbitrary callable can be provided as a data factory, whereas attrs requires factories to be a attr.Factory instance.
  • gathering fields from annotations will soon be supported in attrs with Add option to collect annotated fields python-attrs/attrs#262 via auto_attribs=True, which removes one of the remaining differences
  • at a surface level, attrs has almost the superset of functionality, which gives me hope that a compatibility layer could be provided.
    • the only dataclasses feature missing from attrs is eq (covered above).

If anyone is aware of deeper functional differences, I'd love to hear them. Thanks!

edit1: added notes on eq
edit2: clarified default_factory difference

@ericvsmith
Copy link
Owner

I think this is a useful exercise, thanks. I agree that it would be a shame to inadvertently miss something that's in attrs, especially if that locks us in to an API that we regret. I'll spend some time reviewing your table one-by-one, and comment as I go.

@ericvsmith
Copy link
Owner

As far as conversion functions and validators, I'd like to not support these. I'm hoping that static type checking gets us most of the way there.

@ericvsmith
Copy link
Owner

default / default_factory is mostly covered in issue #24. default is used to specify a default value, and default_factory is used to specify a callable that generates a default value. They need to be separate, because otherwise you'd have to do something like initial_value = default() if callable(default) else default, which precludes you from having a default value which is itself a callable. It's an error to specify both default and default_factory.

@chadrik
Copy link
Author

chadrik commented Nov 5, 2017

default / default_factory is mostly covered in issue #24.

Thanks, that conversation cleared it up for me. I updated my post above with the new info.

As far as conversion functions and validators, I'd like to not support these. I'm hoping that static type checking gets us most of the way there.

I don't think that static type checking has much impact on the need for converters. Take something like this for instance:

@attr.s
class C:
    x: int = attr.ib(default=0, converter=int)
    y: int = attr.ib(default=0, converter=int)

c = C('1', 1.1)

This pattern is very common. A hypothetical mypy plugin for attrs or dataclasses could make C('1', 1.1) valid by using the converter's argument type for __init__ if present.

Without converters the best we can do this:

@dataclass
class C:
    x: int = 0
    y: int = 0

c = C(int('1'), int(1.1))

Static type checking doesn't really have much to offer here in terms of ease of use: the best it can do is nag us to cast everything to int. That does not alleviate the inconvenience of having to do that throughout your codebase, whereas a converter defined on the field does. Moreover, conversions cannot be accomplished post-init, because the converter's type needs to be understood by the static type-check plugin. Bottom line: converters are a convenience without a valid workaround, and their absence will be frustrating to users.


As for validators, static type checking gets us part of the way there, but certainly not most of the way there. Here are some example validations:

  • x in y
  • x in range(y, z)
  • re.match(y, x)
  • len(x) < y
  • instance(x, Y)

All of these require runtime validation except the last. That said, validation can be performed in post_init, so unlike converters, at least there is a workaround.


Is there an argument against adding metadata? It's hard to overstate how important this one is. It's a catchall for anything and everything that dataclasses cannot or should not have first class support for. In other words, it is the foundations for third-party utilities built up around dataclasses, for things such as UI presentation, database ORMs, serialization, and yes, even validation.

@ilevkivskyi
Copy link
Contributor

I think the fact that static type checkers prohibit something like:

class C:
    x: int = ...
    y: int = ...

c = C('1', 1.1)

is rather good, not bad. What are the use cases for converters (apart form being temporary workarounds themselves)? As for validators, they can be added to __dataclass_post_init__ (I hope we will find a better name). Moreover, the latter can perform cross-field validation, so I agree with @ericvsmith here, we probably don't need validators and converters.

As for metadata, I don't have a strong opinion, but could imagine that it is indeed useful.

@ericvsmith
Copy link
Owner

metadata has been added.

@chadrik: where do you propose this documentation should go? Or is this just an exercise for the design phase, which I think has ended. It's not appropriate for this to go in the stdlib documentation.

@chadrik
Copy link
Author

chadrik commented Dec 2, 2017 via email

@ericvsmith
Copy link
Owner

Either the Wiki (which I have no access to) or maybe under attrs' documentation (ditto).

And note that you can use dataclasses today, from PyPI, on 3.6. So let the lobbying begin, once the PEP is accepted.

@ericvsmith
Copy link
Owner

Also, note that attrs' these parameter is roughly equivalent to the dataclasses.make_dataclass() function. So I think the only real difference in your table is __slots__, validate, and convert. I deliberately don't want to support validation and conversion, instead leaving that to static type checkers (see #60 (comment) above).

As for __slots__, that's a deliberate decision. Although I have another decorator which I'm not including in the PEP that adds __slots__ and returns a new class. See add_slots() in dataclass_tools.py in this repo. Because it's the only parameter that causes dataclass() to return a new class, I thought it was best to leave it out, at least for now. I'd like to make sure dataclass() is seen as something that just adds methods to a class, not returns a new class. Maybe that will change over time.

@Tinche
Copy link

Tinche commented Dec 2, 2017

I think that the "return a new class" approach is fundamentally incompatible with metaclasses and especially PEP 487. Since there is no way to add slots to an existing class, I'm considering a different API for slot classes in attrs too. Or, you know, Python could grow a better __slots__ interface itself, but I'm not holding my breath.

@gvanrossum
Copy link

gvanrossum commented Dec 2, 2017 via email

@Tinche
Copy link

Tinche commented Dec 2, 2017

Actually we should design a new slots interface. The original was designed before we had class decorators.

Yes please!

@gvanrossum
Copy link

That won't be easy though -- it means that the instance layout has to be made changeable after the class object has been created (which happens when the metaclass creates it -- before the class decorator runs). Mayby there are some folks on python-ideas interested in brainstorming on how to do this.

@chadrik
Copy link
Author

chadrik commented Dec 5, 2017

One last effort on this topic:

I think the fact that static type checkers prohibit something like:

class C:
    x: int = ...
    y: int = ...

c = C('1', 1.1)

is rather good, not bad.

What if say, over half of the uses of C required converting a variable to int, and what if that conversion was not as simple as calling a builtin but also required an import from some other module? This doesn't seem like a question of correctness to me, but rather one of convenience. Very many classes in the real world perform some conversion of arguments within their __init__ methods, and unlike validators I don't see a good alternative for those who don't want to perform conversions all over their code instead of in one place. There's the possibility of casting and re-binding the attributes in __post_init__, but that would break static type-checking: for that to work the mypy plugin needs to integrate converter annotations into the __init__ annotations, which means dataclasses needs first class support for converters.

@ilevkivskyi
Copy link
Contributor

ilevkivskyi commented Dec 5, 2017

@chadrik

What if say, over half of the uses of C required converting a variable to int

I think such situations are relatively rare (like legacy API or similar). And IIUC this use case is covered by a combination of InitVar and __post_init__:

@dataclass
class C:
    a: str
    b: str = field(init=False)
    _b: InitVar[bytes]
    def __post_init__(self, _b) -> None:
        self.b = convert_from_legacy_api(_b)

aa: str = 'a test'
bb: bytes = b'b test'

c = C(aa, bb)  # OK

And this will work well with static type checkers.

@ilevkivskyi
Copy link
Contributor

(I think you started with a/b/_b and then continued with x/y/_y?)

Indeed :-) Fixed!

@ericvsmith
Copy link
Owner

I think there's nothing else to add here. Closing this issue.

@EhsanKia
Copy link

EhsanKia commented Jun 18, 2021

I honestly don't see how the dummy InitVar + extra var + post_init is a Pythonic replacement to the simple and clean converter. And it's also, as far as I can tell, not a solution for frozen dataclasses.

Take this very simple and common dataclass

@dataclasses.dataclass(frozen=True)`
class Group:
    names: Sequence[str]

How do you insure names is not mutable itself? Normally, a simpler converter=tuple would do the job, but now, you have to do all sorts of hacks and object.__setattr__ and so on. None of it is pythonic, clean or user-friendly.

@gvanrossum
Copy link

It’s unpythonic to expect “deep” frozen-ness. A frozen object disallows attribute assignment but doesn’t care about modifying attribute values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants