Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider mandatory explicit disambiguation #173

Closed
ariebovenberg opened this issue Oct 14, 2024 · 13 comments
Closed

Reconsider mandatory explicit disambiguation #173

ariebovenberg opened this issue Oct 14, 2024 · 13 comments
Labels
discussion Discussion is needed before proceeding

Comments

@ariebovenberg
Copy link
Owner

ariebovenberg commented Oct 14, 2024

Should forced explicit disambiguation remain the default? Currently, the disambiguate= kwarg is required in all relevant methods:

local_dt.assume_tz("US/Eastern", disambiguate="compatible")

A common complaint is this makes for a clunky/noisy API (e.g. python-pendulum/pendulum#590 (comment))

I see the following options:

  1. Keep the behavior.

    pro:

    • it forces users to confront complexities inherent in any local time conversion
    • no 'guessing in the face of ambiguity'

    con:

    • can be frustrating to type
    • it doesn't help a novice user understand what disambiguation is, or why it's needed.
  2. make disambiguate= optional, with "compatible" as the default.

    pro:

    • it is consistent with most other moderns datetime libraries
    • most users probably don't want to use non-"compatible" option anyway

    con:

    • doesn't discourage users from using local times, without knowing the downsides
  3. make disambiguate= optional, with "raise" as the default.

    pro:

    • no 'guessing in the face of ambiguity'
    • An exception provides a good way to inform users when ambiguous situations occur—i.e. a long description with link to the docs.
    • most users probably don't encounter ambiguous situations

    con:

    • Users may be unpleasantly surprised to get errors when their code has been working fine for months/years until this edge case was encountered.
  4. use an "optional" stub file that increases strictness. Users that want strictness can install it.

    pro:

    • enables configuring desired strictness. Everybody gets what they want
    • working in the REPL allows you to omit the disambiguation

    con:

    • it's unusual to do this kind of "magic" using stubs. It'll require some explanation, and may introduce unexpected type-checking issues
    • doesn't improve correctness for the majority of users, who won't use the 'stricter' stubs
  5. use warnings (NOTE: this option was introduced in a comment later, but included here for a good overview). Basically the same as option (2), but trigger a ImplicitDisambiguationWarning that links to the docs. This warning can be disabled globally, or—of course—convince the user to specify disambiguation.

    pro:

    • educates the user about the problems of implicit disambiguation,
    • doesn't force casual users to address the issue, if they don't want to

    con:

    • it only triggers at runtime. You could still be deploying 'broken' code
@ariebovenberg ariebovenberg added the discussion Discussion is needed before proceeding label Oct 14, 2024
@spacemanspiff2007
Copy link

I think the issue from the pendulum comment is that it's rather hard to go from a string containing a date to a corresponding ZonedDateTime object. I've come up with this but it actually is rather clunky.

date = whenever.Date.parse_common_iso('2021-01-01')
z = whenever.LocalDateTime(date.year, date.month, date.day).assume_tz("US/Eastern", disambiguate="compatible")
z = whenever.ZonedDateTime(date.year, date.month, date.day, tz="US/Eastern")

Maybe a from_date class method which automatically picks the first existing hour of the day?
Also maybe a to_tuple to get (year, month, day) to Date and maybe Time.

Note that you're not 100% consistent about the disambiguate usage because for ZonedDateTime you've assumed a default. However for transitioning it should always be required to make it clear that there might be some unexpected behavior.

On a side note:
My old scheduler has a bug which came from pendulum assuming behavior during dst transitions.
Only because I was forced to think about what was happening during the transitions I realized that there is a potential issue.
So I think it's worth keeping the behavior:
I'd rather have something a little bit more clunky which makes me do the right thing instead of something a little bit more elegant which leads to pitfalls in edge cases!

@ariebovenberg
Copy link
Owner Author

Maybe a from_date class method which automatically picks the first existing hour of the day?

There's Date.at:

>>> Date(2020, 3, 2).at(Time.MIDNIGHT)
LocalDateTime(2020-03-02 00:00:00)

although you can also strptime, which simply wraps datetime.strptime:

LocalDateTime.strptime("2020-03-03", "%Y-%m-%d")

Note that you're not 100% consistent about the disambiguate usage because for ZonedDateTime you've assumed a default.

Correct, although I cheated a bit by saying all methods require it 😁 . The reason for the constructor exemption is that it's usually called with literals, which are likely correct.

More on topic again:

I'd rather have something a little bit more clunky which makes me do the right thing instead of something a little bit more elegant which leads to pitfalls in edge cases!

Happy to hear it! What do you think of making disambiguate="raise" the default though? It used to be this way version <0.6. That way you get a clean API up until you have to handle this edge case. It's similar to ZeroDivisionError in a way.

@ariebovenberg
Copy link
Owner Author

For improving the parsing API, see #174

@mattsta
Copy link

mattsta commented Oct 16, 2024

Thanks for considering easier usability options!

Another minor note around disambiguate: it's a required parameter, but doesn't show up via completion/introspection probably because the module is an extension? Not sure if there's a way to add more method signature hints for tools to help discover parameters.

basic usage showing we can't see the parameters for completion or in-line documentation:

In [7]: f = whenever.LocalDateTime.from_py_datetime(dateutil.parser.parse("2021-01-01 01:01:01"))

In [8]: f
Out[8]: LocalDateTime(2021-01-01 01:01:01)

# but, no documentation or tab completion available on extra methods
In [9]: f.assume_tz?
Docstring: Assume the datetime is in a timezone
Type:      builtin_method

What do you think of making disambiguate="raise" the default though? It used to be this way version <0.6. That way you get a clean API up until you have to handle this edge case. It's similar to ZeroDivisionError in a way.

As you noted already, it's the "burden of correctness" problem.

It also depends on how "serious" the system is. If I'm just scheduling timestamps to change my RGB light strip modes, an extra 20+ characters of "correctness" on every time manipulation feels excessive.

but, for more serious systems, we don't want things exploding due to hidden assumptions. Do we want our systems to randomly crash every 5 years when we actually encounter a '60' leap second value? Or the popular "extra ISO week in the year" surprises too.

I think it would be nice to have some "default behavior" options (the evil ThingFactoryFactoryConstructor pattern) to use a consistent instance of the API where all operations conform to the user's preferred rounding/fallback by default. This would be the same as everybody writing custom wrappers around all operations themselves, but just provided library-side instead of user-side every time.

@spacemanspiff2007
Copy link

spacemanspiff2007 commented Oct 16, 2024

Not sure if there's a way to add more method signature hints for tools to help discover parameters.

Maybe a stub file? It's really annoying that the literal values don't show up!

It also depends on how "serious" the system is. If I'm just scheduling timestamps to change my RGB light strip modes, an extra 20+ characters of "correctness" on every time manipulation feels excessive.

But why don't you schedule them with Instant anyway?
There you don't have that problem and converting to Instant never requires an disambiguation.
It's the recommended and proper choice for scheduling.
It seems like the parameter again leads to better code! 😉

I think a good design makes correct things easy and wrong things hard.
If I have to wrap every call in a library then maybe I am doing the wrong things?

@mattsta
Copy link

mattsta commented Oct 16, 2024

But why don't you schedule them with Instant anyway?

Good point! I think it depends on the input structure and UI being consumed. Lots of parsers like returning no-timezone datetime objects then we just need to populate a timezone.

Some of our current usage is reading free-form time fields on behalf of users (command line arguments), then applying their time to a specific timezone configured elsewhere. So the user only provides "tomorrow at 4pm" and we schedule for "now + 1 day @ 4pm US/Eastern".

I converted a platform from pendulum last weekend and a lot of the changes were pendulum.parse(thing, tz="US/Eastern") to (more correct but also more difficult to type all at once) parse to datetime -> LocalTime to consume datetime -> ZonedDateTime convert tz with disambiguate.

Just usability decoration I guess, but the much better whenever type system (and speed) and having an actively improved project is definitely worth the cleanup everywhere.

@ariebovenberg
Copy link
Owner Author

it's a required parameter, but doesn't show up via completion/introspection probably because the module is an extension?

I suspect this is indeed the case 😢 . I investigated this before, and found that each editor/linter basically does something different. Although I did check the "major" editors worked:

✅ Neovim

image

✅ PyCharm

image

✅ VS code

image

I suspect the iPython/jupyter shell you've got doesn't support it. A quick google I found: ipython/ipython#10044. Looks like __text_signature__ could be the key here...I'll create an issue.


I think it would be nice to have some "default behavior" options

This would be great, although toggling defaults globally wouldn't work because you'd affect other libraries using whenever as well.

One interesting option could be a linter or alternate type stub file that would more strictly enforce these parameters—only if you explicitly enabled/installed it. This would be tricky to implement, but could be a "have your cake and eat it too" solution. Casual users have their "it just works" API, and responsible teams can set an extra check on their code. That said, I really like the idea of educating users through the API, although I empatise that's a limit to how mach 'annoyance' users can take 😄.

@ariebovenberg
Copy link
Owner Author

Another option could be to use the warnings module. It might just be perfect for this 🤔. Calling .assume_tz() without explicit disambiguate would be the same as disambiguate="compatible", but would also trigger ImplicitDisambiguationWarning with a clear explanation. You could then disable it globally with warnings.simplefilter("ignore", ImplicitDisambiguationWarning)or—of course—add the disambiguate parameter.

The downside of this is that this warning would only appear at runtime.

@ariebovenberg
Copy link
Owner Author

In any case, #178 at least improves the documentation of all methods involving disambiguation, so the explanation is "a click away" in most IDEs

@BurntSushi
Copy link

In Jiff, I followed in Temporal's footsteps here and just made "compatible" disambiguation the default without any need for explicit opt-in. You can still change the policy to "later", "earlier" or "reject," just like in Temporal.

I went this way generally for two reasons:

  1. My prior is to defer strongly to Temporal unless I have a really compelling reason to do otherwise.
  2. The "compatible" disambiguation default is unlikely to get folks oblivious to disambiguation concerns in the first place into trouble. And it's usually the sensible default that folks want. Needing to specify it every time can be a lot of friction, and this can turn users off from using the library in the first place. It's definitely a delicate balancing act.

@ariebovenberg
Copy link
Owner Author

ariebovenberg commented Dec 17, 2024

Ok, looks like there's enough reason to move towards a "compatible" default in an upcoming release. For the majority of users, the current behavior is simply causing too much friction, and the value of making the choice explicit is limited.

I'll also investigate how well it'd work to have optional "strict" stubs that do enforce this parameter—for those that do want it.

Important though: I'll need to double-check some APIs like replace() first, since I recall there being some weird edge cases...

@ariebovenberg
Copy link
Owner Author

ariebovenberg commented Dec 22, 2024

The change is almost ready, and there was indeed a potentially surprising edge case in replace().

The case occurs when dealing with repeated times.

Imagine with have the "second" occurrence of 2:15 AM in the day the clock goes backwards. Replacing the "minute" component with 14 shouldn't suddenly revert the datetime to the "first" (i.e. "compatible") occurrence, which would move the datetime back by 61 minutes (unexpected!). Temporal solved this by reusing the UTC offset if possible. We can also do this. The problem is that replace() also allows replacing tz, which muddles things a bit. To me, the most logical solution is to only reuse the UTC offset if tz stays the same, but this still has two downsides:

  1. It makes the disambiguation handling pretty complex (only for an edge case—but still)
  2. What is the "same" timezone, can be complicated. Of course you can always compare the ID directly (Amsterdam != Paris), but some IDs purposefully observe the same offsets. should second_2_15am_amsterdam.replace(minute=14, tz="Europe/Paris") still move by 61 minutes?

For now, I accept these slight weirdnesses—especially since it only affects very rare cases.

(note: Temporal solves this by not allowing the equivalent tz in their replace() method)

@ariebovenberg
Copy link
Owner Author

Ok, the change is out with version 0.6.16. I'll close this issue, and create a follow-up if needed given the edge case above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion is needed before proceeding
Projects
None yet
Development

No branches or pull requests

4 participants