Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for fields with local dataclass types #180

Merged
merged 3 commits into from
Jan 14, 2024

Conversation

mishamsk
Copy link
Contributor

@mishamsk mishamsk commented Nov 24, 2023

Hey,

This PR makes the following work:

def test_local_type():
    @dataclass
    class LocalType:
        pass

    @dataclass
    class DataClassWithLocalType(DataClassDictMixin):
        x: LocalType

    obj = DataClassWithLocalType(LocalType())
    assert obj.to_dict() == {"x": {}}
    assert DataClassWithLocalType.from_dict({"x": {}}) == obj

First why: as you know, I use a custom deserialization base class that maintains a global registry of DataClassDictMixin subclasses allowing discrimination by subclass. I haven't migrated to 3.10, so it is still there.

This in turn means that at any given moment all subclass names must be unique. This is inconvenient for tests, but since so far mashumaro enforces classes in global scope, there is no easy way to create isolated tests, that define classes and then remove them.

Not sure if there are a lot of real life use cases outside of tests, but there may be...

The way I've implemented it is minimally disruptive. It should change nothing unless the type annotation is a locally defined class. Generally speaking, the current code generation can stop relying on fully qualified names of types entirely and use my change that just adds them to locals before executing. But that may have bigger consequences, so I opted no to do that.

Fixes #182

@Fatal1ty
Copy link
Owner

Fatal1ty commented Dec 2, 2023

Hi @mishamsk

Thank you for bringing attention to this problem. I've been placing dataclasses globally in the tests, trying to postpone the inevitable :) In fact, I have already made preparation to solve this problem with local types. There is clean_id function that strips off angle brackets:

def clean_id(value: str) -> str:
for c in ".<>":
value = value.replace(c, "_")
return value

It would be safer to use it because someone might have more than one local type. I've made a quick proof of concept in this commit:

if "<locals>" in field_type:
field_type = clean_id(field_type)
self.ensure_object_imported(ftype, field_type)

But the most important thing is that <locals> can appear not only in _unpack_method_set_value. For example, when someone is using codecs, the generated by a local encoder code may need a reference to a local type. The safest way to find all potential places is to go from where field_type is being called. I understand this may require a deep dive into non-trivial code and I am willing to do it on my own, but if you want to challenge yourself, I don't mind :)

@pep8speaks
Copy link

pep8speaks commented Dec 3, 2023

Hello @mishamsk! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2023-12-23 01:55:42 UTC

@mishamsk mishamsk force-pushed the support-local-dataclasses branch from 11408e3 to ba0483e Compare December 3, 2023 20:52
@mishamsk
Copy link
Contributor Author

mishamsk commented Dec 3, 2023

@Fatal1ty well, I wasn't sure how far you'd want to go with refactoring the code.

First using the existing helpers obviously makes sense - I added a commit to PR, which uses what you've suggested except for two changes:

  • I took a liberty to implement a more "robust" clean_id function
  • implemented local type check as a helper based on type directly, instead of the string representing coming from type_name

I get the fact that this won't solve the issue completely, but it doesn't mean it can't be merged (I guess).

I reviewed type_name usage and I'd say, if refactoring to fully support locals is considered, it may worth moving away from module imports + generating code that uses qualnames entirely. Instead, just create unique local names for each type parameter used in packing/unpacking and bind them in globals/local namespace used at compilation. This should theoretically be more resilient to whatever happens to python.

It feels a bit daunting to add a fork in every single place where type_name is used, to either use the old method (import module + refer to types/object via qualname) OR the new one (bind them to a local name).

It is a pretty big change either way. Maybe worth doing it gradually (e.g. for Mixin subclasses only first, then codecs).

What do you think?

@Fatal1ty
Copy link
Owner

Fatal1ty commented Dec 3, 2023

  • I took a liberty to implement a more "robust" clean_id function
  • implemented local type check as a helper based on type directly, instead of the string representing coming from type_name

Yeah, it looks better this way!

I get the fact that this won't solve the issue completely, but it doesn't mean it can't be merged (I guess).

Absolutely. One step at a time is better than nothing.

I reviewed type_name usage and I'd say, if refactoring to fully support locals is considered, it may worth moving away from module imports + generating code that uses qualnames entirely. Instead, just create unique local names for each type parameter used in packing/unpacking and bind them in globals/local namespace used at compilation. This should theoretically be more resilient to whatever happens to python.

This would be a big change that is likely to affect performance. Some time ago I was tempted by eliminating qualnames, thinking that one name instead of "getattr chain" would speed things up, but to my surprise performance was getting the same or even worse. However, I admit that something might have changed since then and it's worth trying again.

It is a pretty big change either way. Maybe worth doing it gradually (e.g. for Mixin subclasses only first, then codecs).

I totally agree. I will review your changes soon.

@mishamsk
Copy link
Contributor Author

mishamsk commented Dec 4, 2023

I haven’t made changes across the codebase, wanted your thoughts on it first.

So we are narrowing the scope to mixins only.

Some time ago I was tempted by eliminating qualnames, thinking that one name instead of "getattr chain" would speed things up, but to my surprise performance was getting the same or even worse.

Interesting. I was expecting python to create a closure when it runs exec, so it should not matter how exactly particular object was bound before execution, rather closure will end up having direct pointers to the object…but this may not work this way indeed.

I guess I can start adding bindings for local objects anyways. To switch everything to locals or not may be a later stage decision…

@mishamsk mishamsk force-pushed the support-local-dataclasses branch from bf8ec15 to 4998a14 Compare December 17, 2023 04:05
@mishamsk
Copy link
Contributor Author

so this fixes #182

what I did since the last commit:

  • rebased on the latest master
  • went through all of the usages of type_name and added local check to most places. I used the following logic:
    • if it is only for an exception message - do nothing, will include "locals"
    • if a given helper function doesn't already expose access to builder - do not bother, leave as is
    • ignore unrealistic usage scenarios
    • all other places were handled, both in packer & unpacker
  • add more local types to the test (enum, literal, (generic)serialized type, pathlike)

I reverted and renamed the is_local_type to check the resolved name, since in some cases (like with Literal) the local type was wrapped in typing construct and thus to identify local type, would mean recreating the logic from type name.

Alternative would be to change type_name return to also produce a flag for local types, but that would yield so many changes in the code, I thought it is not worth it.

I'd vote to merge it this way, as it should cover 99% of real-life cases...but up to you. I am eagerly waiting for this;-)

@Fatal1ty
Copy link
Owner

Fatal1ty commented Dec 22, 2023

Sorry for late answer, the New Year's Eve fuss takes up a lot of my time.

Alternative would be to change type_name return to also produce a flag for local types, but that would yield so many changes in the code, I thought it is not worth it.

You made huge work here but I have a small suggestion about local type names. It seems like it would be practical if we encapsulate checking for the localness and adding to the "globals" in a new type_name (or better get_type_name) method in CodeBuilder (with the same arguments as type_name from mashumaro.core.meta.helpers has). This method will return normal type_name(...)'s result if it's not a local type and a cleaned type name otherwise. It must eliminate all these duplicates of the code that you most likely got bored with while writing.

if it is only for an exception message - do nothing, will include "locals"

Great, we're on the same page here. I also think that there is no need to change it there.

P.S. The next time I will be available in early January. Merry Christmas and a happy New Year, take more time off from the computer! 🎅

@mishamsk
Copy link
Contributor Author

Sorry for late answer, the New Year's Eve fuss takes up a lot of my time.

not surprised;-)

method in CodeBuilder

not a big fan of methods lately, but sure, that does make sense. applied the change. except that I named it get_type_name_identifier to relate the fact that the goal is to get a valid identifier, not just some name. Given the real usage I opted to not pass all of the type_name args to this method, seems unnecessary.

The next time I will be available in early January. Merry Christmas and a happy New Year

Thanks, same to you!

take more time off from the computer!

sounds like my wife told you something;-)

@Fatal1ty Fatal1ty merged commit f33a528 into Fatal1ty:master Jan 14, 2024
27 checks passed
@Fatal1ty
Copy link
Owner

I added usage of get_type_name_identifier for fields with Self in their type annotations and for annotated serializable types with Self in the _deserialize / _serialize methods. I guess, the Discriminator's code may still have problems with local types but let's leave it to the first case.

Thank you for this pull request. You did it exactly the way I would have done it.

@Fatal1ty
Copy link
Owner

I guess, the Discriminator's code may still have problems with local types but let's leave it to the first case.

Discriminators are fixed in #189.

@mishamsk mishamsk deleted the support-local-dataclasses branch July 3, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The types defined inside the function result in syntactically invalid generated code
3 participants