Skip to content

CI for Environments

Amy Wooding edited this page Jan 8, 2022 · 3 revisions

In brief

We need environments to be shareable, reproducible and upgradeable for at least a 2 month window (ideally 6-12 months). We're thinking about this mainly from an easydata-workshop perspective, but the problems we want to solve also crop up when trying to get a team working with a shared conda environment. This is deceptively non-trivial.


Problems/ what we want

  1. We want participants to be able to install and load up a working environment reliably quickly. Ideally, we would do this with a lock files.
  2. Environments need to be upgradeable. When we add a package, or when a package upgrades, we want to be able to update the environment for everyone.
  3. Building environments from lock files has to be platform+architecture specific, so you need a lock file for each setup.
  4. Lock files for conda don't really exist that include the pip section properly.
  5. Upgrades and resolving can be a giant headache (for all the reasons we've been dealing with the past couple of weeks). These issues and more are alluded to here: http://iscinumpy.dev/post/bound-version-constraints/
  6. To avoid this headache, we'd like to be able to test the solving on clean environments on multiple platforms to be able to catch issues before we break the environment build. This way the changes are small and more easily debugged, rather than a giant snotball of changes that is hard to figure out.
  7. We need to be able to hand pin versions to avoid bugs when they come up but keep track of when we can unpin again. It would be great to easily automate the testing of the upgrade without breaking the environment build.
  8. Python environments become huge and can't be resolved at some point. Move to 1 env per repo and then more than 1 env per repo.
  9. What's cached locally affects the build.
  10. We want to be platform agnostic, so a Docker container isn't the answer for this.

Related Problems, but not mainline at the moment

  1. If I'm maintaining a library and associated notebooks as documentation, I'd like to be able to provide an environment (and even datasets) that work to run the notebooks so I don't have to debug individual environment issues for users.
  2. If I'm maintaining a project, I'd like to know when my dependencies are shifting in a way that is incompatible with my project. I'd like to run CI on --dev so I can know what's coming down the pipe and if there are any breaking changes, and anything that breaks my tests. The part that's tricky is when it breaks my environment before it breaks my test. It would be nice to have a "helping hand" on that step.

What have we tried and things we've looked at

  1. make + conda env --export
  2. conda lock
  3. conda lock + Poetry
  4. mamba solver vs. conda solver

What don't we know

  1. Does anyone else care? How do people try to work around this already? This is a maintainer problem, not a user problem.
    • For web-based applications, we've heard of pip lock files, and git actions that resolve and propose security patches as they become available. This idea should be usefully applicable in this case.
  2. What's the easiest/hackiest way to hand build an MVP that addresses the core issues? We need something that works for us for the next 2 months. We're willing to try something that is messy to do, but works.

Running Comments

We're trying some experiments here. In fact, we're trying to get around the whole thing in a totally different way at the moment: hosting your own conda repodata. Feels like sledgehammer and will no doubt have a whole slew of different trade-offs than what we've been wrestling with so far. We'll see.

References

  1. http://iscinumpy.dev/post/bound-version-constraints/
  2. https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
  3. I wish conda-lock actually reliably worked like this: https://pythonspeed.com/articles/conda-dependency-management/