Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have conda environments directly in workflow source/runs directories #52

Open
elliotfontaine opened this issue Jul 29, 2024 · 0 comments
Labels
type:feature New feature or request type:packaging making the project easier to install

Comments

@elliotfontaine
Copy link
Contributor

elliotfontaine commented Jul 29, 2024

Ten simple rules and a template for creating workflows-as-applications #Rule 7

"When creating software environments, many workflow managers will save these within subfolders in the working directory. This facilitates reproducibility by keeping everything within a single subdirectory. As a result, every new analysis will generate a whole new set of environment files, which can be wasteful, especially if there are limits on the number of files and folders that can be created on a HPC cluster. Likewise, users may want to specify the installation locations, especially if databases or environments will take up a considerable amount of disk space. Having centralised locations for your environments and databases and allowing these locations to be customised by the user can alleviate this issue. Caution is advised, however, as specifying a location outside of the working or installation directories may have unforeseen consequences, such as files being moved or deleted. As such, many users will prefer to keep environments and databases in the working directory. In our templates, we use the installation directory of the command line tool as the default for both conda environments and databases as this represents the safest centralised location for these files, but users can specify the working directory if they prefer."

Right now, the workflow use system-wide/user-wide conda environments, that are installed before ever running the workflow. It allows saving disk space, but is subject to external modifications.

We could do a one-time installation of the conda environments in the workflow source directory, and they would then be copied in each run directory.
A good compromise on disk space would be to have the runs look for environments in the source directory, so the environment state would be shared with other runs BUT NOT by other applications/users on the server.

@elliotfontaine elliotfontaine added type:feature New feature or request type:packaging making the project easier to install labels Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request type:packaging making the project easier to install
Projects
None yet
Development

No branches or pull requests

1 participant