Add a more idiomatic way to define hashing #202

hynek · 2017-06-02T09:57:49Z

The current way of using True/False/None is confusing even to me.

I propose we use an Enum (probably just class on legacy Python) a la:

class Hashing(Enum):
    SMART = "smart"  # None → new default
    BY_ID = "by_id" # False
    WRITE_FOR_ALL = "write_for_all" # True
    WRITE_FOR_NONE = "write_for_none" # True, but the default on attr.ibs is False

The last two names are terrible of course.

I probably could be talked into depending on https://pypi.org/project/enum34/ on legacy Python.

Opinions?

P.S. Now while typing this out I realized that this could solve a long-standing complaint both I and many users have: assuming you want an attrs-generated method based on a small subset of attr.ibs, you have to write a lot of hash|init|repr=False. Ditching bools for enums would allow for that. 🤔 But let’s talk about that in a different ticket.

The text was updated successfully, but these errors were encountered:

Tinche · 2017-06-05T14:54:44Z

We should definitely do something to make the hashing behavior clearer.

I'm currently working on a PR to "fix" slots hashing behavior, and attr.s(hash=False) is really throwing me off.

The docs say nothing about what should happen when hash=False.

My personal understanding would be for attrs to do exactly nothing, we're saying do not touch __hash__. However this is complicated. I will demonstrate.

Assume a very simple class, with hashing disabled:

@attr.s(hash=False)
class A:
    a = attr.ib()

The equivalent of this class is: https://gist.github.com/Tinche/d8c1dadfaa3aa4618b736a61e752b3e1

Try pasting them both. The attrs one will be hashable, due to falling back to __object__.__hash__. The normal one will not be.

This is because in the normal class, the interpreter will see the __cmp__ and insert __hash__ = None. In the attrs class, the __cmp__ method isn't there when the class is defined; we put it into the class after the fact. So the interpreter won't touch __hash__ and it will fall through to object.

This is the root cause of why our slots behavior is different.

hynek · 2017-06-06T11:18:49Z

Yeah so that’s settled.

Ideas for better names for WRITE_FOR_ALL/NONE? @glyph? @njsmith?

P.S. the replacement/slots thingie explains a lot of my own confusion.

Tinche · 2017-06-06T20:18:39Z

What functionalities are we aiming for exactly?

I'm guessing:

make hashable by generating a hash function based on attributes
make hashable, but attributes are opt-in instead of opt-out
make hashable by using identity-based hashing
make unhashable
delegate to an existing self.__hash__, whatever that may be

(edit) Also:

smart

njsmith · 2017-06-06T21:54:23Z

I don't really understand the original post, but regarding the cases that need to be supported: I think it helps narrow things down to remember that for any given __eq__ there are exactly two valid ways of defining __hash__. Either it must include exactly the same things that go in the __eq__ calculation, or else you have to set __hash__ = None. This is just a fact about the language; if you violate it then dicts stop working.

So if attrs is generating its own __eq__, then the only meaningful decision is "do you want this object to be hashable?", and right now the three answers are "yes" (True), "no" (False), and "guess" (None, meaning yes for immutable objects and no otherwise). (Except apparently the "no" case is buggy? I'm pretty sure that the intention was for cmp=True, hash=False to make an unhashable class.) And then on a per attribute level you can also say yes/no/guess, but I can't think of any time that "guess" is wrong; the other options just give you interesting ways to break your class.

And if attrs isn't generating its own __eq__, then not sure why it would ever want to generate a __hash__. If the user is taking responsibility for __eq__ then we should stay out of the way and give them the regular python behavior imo.

njsmith · 2017-06-06T22:00:42Z

That said, I can see the case for having nicer shorthands for controlling which attribs go into the __eq__ calculation, like toggling it between opt-in and opt-out. But there's no reason to give __hash__ a separate set of toggles here; it should follow what __eq__ does.

hynek · 2017-06-07T05:11:10Z

Actually, you specifically asked for being able to fall back to hashing by ID – that’s why it’s in there. :) And I think that’s a great and useful feature, so I’m not trying to push the blame in your direction – just reminding. :)

The point of this ticket isn’t really to invent new behaviors, it’s just making the existing ones more understandable. Especially because one of the happened by accident (see also #205).

But maybe…how about this:

class Hashing(Enum):
    SMART = "smart"  # None → new default
    UNHASHABLE = "unhashable"  # same as None for unfrozen
    BY_ID = "by_id" # False
    BY_ATTRS = "opt_out" # True
    BY_ATTRS_OPT_IN = "opt_in" # True, but the default on attr.ibs is False

njsmith · 2017-06-07T08:21:06Z

Falling back to hashing by id is definitely useful! If, and only if we are also falling back to doing __eq__ by id. That falls under my line about "If the user is taking responsibility for __eq__ then we should stay out of the way and give them the regular python behavior"

hynek · 2017-06-07T09:19:47Z

Yeah so the fallback kind of happened by accident (cf. Tin’s explanations) but this time we have the chance to fix it properly including the deprecation year if we switch to enums.

njsmith · 2017-06-07T09:27:02Z

The thing where cmp=True, hash=False ends up with a generated __eq__ combined with __hash__-by-id just sounds like a straightforward bug to me. It just makes it so you can have two objects that compare equal but if you put them both into the same set then it's literally random whether you end up with 1 object added or 2 objects added.

Beyond that I'm mostly just kinda lost about what all these enums are supposed to be or what the overall goal is here. I think I showed up in the middle of the conversation :-)

hynek · 2017-06-12T10:32:01Z

So I guess we have to break bw-compat again and set __hash__ to None by hand if __eq__ is set? :|

glyph · 2017-06-12T17:56:16Z

So I guess we have to break bw-compat again and set hash to None by hand if eq is set? :|

Woah there, Tex. Ease your finger up off that trigger :-).

In this case I'm not convinced a breaking change is necessary, because there's no alternate behavior to suggest: a warning would be sufficient. The user has explicitly asked for nonsense, and if the nonsense appears to work today then it's fine to deprecate it for a while before raising an error.

hynek · 2017-06-12T19:35:40Z

Well the problem (which Tin explained elsewhere, I'm gonna add here for posterity) is: normal Python class behavior is setting dunder hash to None, if there's an Dundee eq. If not, it inherits object's by-id semantics. Now our eq is added after python makes this decision (except when using slots) so object's hash remains.

If slots is true it behaves differently because we build the class differently and it becomes unhashable.

IOW

we have conflicting behavior between slotted and not,
and hash=False doesn't actually mean what it looks like (Don't do anything about hash.) because of a weird side-effect.

Dunno. :|

glyph · 2017-06-12T22:07:23Z

Now our eq is added after python makes this decision (except when using slots) so object's hash remains

Oh, that's deeply unfortunate. I was wondering if the modifying-the-class-after-the-fact thing would come to bite us at some point; looks like it has. Perhaps we should be throwing away the first type and replacing it with a fresh one? (Although that's probably some even worse / more horrible compatibility break than this...)

glyph · 2017-06-12T22:08:19Z

and hash=False doesn't actually mean what it looks like (Don't do anything about hash.) because of a weird side-effect.

Focusing first on getting the exact right behavior with the new, good enum would be the right way to prioritize this. If we need another compat break (and gosh I hope not) then that's a separate thing.

hynek · 2017-10-06T14:40:44Z

JFTR, I started to work on this and ran into another “interesting” fact: the behavior Nathaniel described is specific to Python 3. Python 2 will happily hash by ID if there's a __eq__.

This allows us to harmonize class creation for dict and slots classes. Ref #223 #202

hynek modified the milestone: 17.3 Jun 6, 2017

hynek mentioned this issue Jul 27, 2017

TypeError when using super() and slots=True #102

Closed

hynek added a commit that referenced this issue Oct 6, 2017

Add replace_class argument to attr.s

ca83971

This allows us to harmonize class creation for dict and slots classes. Ref #223 #202

hynek mentioned this issue Oct 6, 2017

Stop modifying classes and always create new ones #260

Closed

hynek removed this from the 17.3 milestone Oct 16, 2017

wsanchez added the Cleanup label Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a more idiomatic way to define hashing #202

Add a more idiomatic way to define hashing #202

hynek commented Jun 2, 2017

Tinche commented Jun 5, 2017

hynek commented Jun 6, 2017

Tinche commented Jun 6, 2017 •

edited

Loading

njsmith commented Jun 6, 2017

njsmith commented Jun 6, 2017

hynek commented Jun 7, 2017

njsmith commented Jun 7, 2017

hynek commented Jun 7, 2017

njsmith commented Jun 7, 2017

hynek commented Jun 12, 2017 •

edited

Loading

glyph commented Jun 12, 2017

hynek commented Jun 12, 2017

glyph commented Jun 12, 2017

glyph commented Jun 12, 2017

hynek commented Oct 6, 2017

Add a more idiomatic way to define hashing #202

Add a more idiomatic way to define hashing #202

Comments

hynek commented Jun 2, 2017

Tinche commented Jun 5, 2017

hynek commented Jun 6, 2017

Tinche commented Jun 6, 2017 • edited Loading

njsmith commented Jun 6, 2017

njsmith commented Jun 6, 2017

hynek commented Jun 7, 2017

njsmith commented Jun 7, 2017

hynek commented Jun 7, 2017

njsmith commented Jun 7, 2017

hynek commented Jun 12, 2017 • edited Loading

glyph commented Jun 12, 2017

hynek commented Jun 12, 2017

glyph commented Jun 12, 2017

glyph commented Jun 12, 2017

hynek commented Oct 6, 2017

Tinche commented Jun 6, 2017 •

edited

Loading

hynek commented Jun 12, 2017 •

edited

Loading