-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cap some dagster dependencies to at most their current major version #11890
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ |
I think this a fine temporary solution but has a couple downsides:
There is a fancier, semi-automated version of that that could be built that would handle minor release as well. . Essentially we would pin dependency to "last known good" and then update that on a weekly basis along with every release, thus avoiding outages between releases. That script is where we would track known problematic releases. I think this change puts the project in a better place than it is now so I support it. |
I understand the impetus behind this idea, but I think it would be a big mistake. It's a practice that works well in JS, since you can have multiple versions of a package installed, but it does not work well in Python. It is almost guaranteed to cause many frustrating dependency headaches for users. We must understand: Users can always add their own pin to fix an underconstraint, but have no way of removing an overaggressive pin.
This is a problem, but I strongly believe it is a smaller problem than what would be introduced by this change. Our CI catches these incompatible new versions fast and lets us add a pin when necessary. A smart guy wrote a small novel on the pitfalls of this approach. Here's an excerpt:
Side note: I have actually been trying to get |
Here is a useful piece of experience - https://github.com/apache/airflow#approach-for-dependencies-for-airflow-core (maybe sqlalchemy should not be <2.0 but <1.5). Actually, Airflow team didn't follow their own rules having sqlalchemy>1.4 in the setup.cfg - apache/airflow#28725. But they found the problem by CI failures before sqlalchemy-2.0 has been released. I don't know how exactly, maybe by using the --pre flag, but wouldn't it cause too many false positives?.. |
Do you mean it only applicable to an official distribution like docker images and the dagster cloud infrastructure / or as a recommended practice for OSS deployments? Having dependencies pinned in a public library would prevent users from being able to control the environment of their pipelines. |
I was on the fence but after reading @smackesey 's links I think we should not cap at all. Capping minor versions will also prevent key bugfixes and security patches from being immediately available, until we move our pins. Capping major versions is only useful for some libraries like sqlalchemy and seems over aggressive for compatible libraries. I think locking and periodic upgrades is the right approach but has to be done by the end-user, not library authors. |
I'm inclined to move forward with this change despite @smackesey well-articulated argument @shalabhc's concurrence. Sean note that I think you can have this change in place and still philosophically be aligned with what you are saying. The only thing we need to make it consistent is some sort of process to ensure that we are not leave these pins in place permanently. The only goal here is prevent breakages that we know will happen and will break users. When a major release of a dependency happens, it should be our policy to remove that pin as soon as feasibly possible. I think these setup.py reflects the fact, accurately, that we are not pre-emptively testing again major upcoming releases of these libraries. At our scale I think that is the right call. This change is not a change of policy. It just accurately reflects that status quo, a status quo that I think we should maintain. |
@smackesey in particular to drive home the point, merging this PR does not represent an embrace of dbt's extremely aggressive pinning policy, which I believe is actually quite a hostile stance to take for a commonly used library in the Python ecosystem. This is just reflecting the reality that we do not test against upcoming major versions of new dependencies. |
Thanks for your reply @schrockn. Regardless of the final decision it will be useful to have the case for/against on the record, so below I go into more detail:
I think this is partially true, in that with perfect implementation of a bound-updating process, you can (mostly) eliminate negative impact for users of the latest version of Dagster. However, even a perfect process will cause pain for users of older Dagster versions (more on this below).
If that's the case, why not take the approach Schreiner suggests in the excerpt I posted: "Only add a cap if a dependency is known to be incompatible or there is a high (>75%) chance of it being incompatible in its next release." IIUC this PR is preemptively capping every first-order dependency. Here's an attempt at birds-eye cost-benefit analysis: In the limit case where new dep versions are tested immediately on release, with bound bumped and a new dagster release immediately issued if needed-- then upper bounds could only be good (for users of latest Dagster version), since they would only ever exclude nonexistent and truly incompatible versions. The question is, how close can we realistically get to that limit case? Assume (a) we keep our weekly release cadence; (b) new dep versions are released at a random time in the week. Then if new dep releases are perfectly monitored, it will take on average 3.5 days for the latest Dagster release to reflect the result of monitoring for a given dep. Scenario 1: We use preemptive upper bounds and have add new testing infrastructure to continually monitor PyPI for new dep releases. Immediately after a new dep release, it is tested and the upper bound bumped in master if tests pass. Scenario 2: Status quo. No upper bound pins except for known incompatibilities. Our CI immediately catches most incompatibilities in new dep releases (due to highest-allowed-version policy in constraint solving). Upper bounds need to be added to get CI to pass any incompatibility. In both scenarios, an erroneous bound (or lack thereof) will be present for 3.5 days on average. But Scenario 2 seems preferable because:
That covers errors for users installing the latest version of Dagster. But what about users on older versions? A preemptive pin policy, even the perfect "limit case" above, still requires users on older Dagster versions to upgrade Dagster to access compatible upgrades from other libraries (this becomes a much bigger issue if you are also bounding minor/patch releases). In contrast, the status quo leaves users on older versions free to bump their deps as they please. IMO this is the single biggest reason not to preemptively pin. |
This sounds like a great compromise/approach. @gibsondan want to apply that filter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this less aggressive per @smackesey 's feedback. Only cap things that we have high certainty will break us.
In honor of pendulum upgrading to 3.0 over the weekend and breaking us, finally getting back around to this - did a much less aggressive pass, only on packages with known breaking changes in the past |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although less pure, given what happened over the weekend, I think this is the right approach.
In the spirit of centralizing information around this decision in this PR given all the existing excellent discussion, linking a case where the defensive caps have caused some friction for users #25398 |
one thing we could consider is adding more tests like this to proactively notify about new versions where we have pins to make sure they are still needed https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-k8s/dagster_k8s_tests/unit_tests/test_version.py#L38-L45 |
Summary:
The goal of this change would be to avoid automatically upgrade to new major versions, which often include breaking changes (e.g. sqlalchemy 2 which broke all dagster users yesterday who made a fresh install)
The downside of this change would be that on older versions, users could be potentially pinned to older versions of their dependencies.
Not strongly held as a good idea but wanted to put it out for discussion.
Resolves #11882
Summary & Motivation
How I Tested These Changes