Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Beta Test] Wave Containers: add support for the Spack package manager #3636

Merged
merged 174 commits into from
May 3, 2023

Conversation

marcodelapierre
Copy link
Member

@marcodelapierre marcodelapierre commented Feb 10, 2023

This PR adds support for Spack as a software provider for Wave containers.
It comes out of some very fruitful conversations myself and Paolo have had recently.

It has the potential to pave the way for on-demand, architecture optimised containers with Nextflow and Wave! (some work yet to be completed). The key missing piece of work in this regard is the adoption of binary caches, to speed up build times (coming later).

Some points:

@pditommaso @evanfloden have even more fun! :-)

@tgamblin @alalazo I hope this can be another useful contribution from the perspective of the Spack community, too!
@vsochat hopefully this will be merged soon enough .. watch this space ;)

marcodelapierre and others added 30 commits January 27, 2023 17:37
Signed-off-by: Marco De La Pierre <[email protected]>
nextflow-io#3508) [ci skip]

Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
This configuration option allow controlling how the
grid executor queries the batch scheduler for the job
current status.

When the setting executor.queueGlobalStatus is `false`
the executor will check the queue where the job execution
was submitted (default).

When the setting executor.queueGlobalStatus is `true`
the executor will check the queue status globally, i.e.
irrespective the queue where the execution was submitted.

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
This commit add support for rclone as stageOutMode
used by Nextflow to control how output files are copied
from the local scratch directory to the task work dir.

The use of rclone is may provide more efficient copy of
large data files over network file system.

Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Abhinav Sharma <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Marco De La Pierre <[email protected]>
@pditommaso
Copy link
Member

Just checked, and it worked on first try 🤷‍♂️

@marcodelapierre
Copy link
Member Author

Just checked, and it worked on first try 🤷‍♂️

Did you use my pipeline with -profile spackwave? I think it worked because the default image is still cached in Wave.

I think you can play with the Dockerfile, WaveClient.groovy line 579 and onwards.
First, you add a directive like RUN echo "hello" >/world at the end of the Dockerfile, so new layer at the end , should be fine.
But then you add it at the very beginning, invalidating the cache.

@pditommaso
Copy link
Member

. Did you use my pipeline with -profile spackwave

yes. let me try again clearing the cache

@pditommaso
Copy link
Member

pditommaso commented May 1, 2023

Is this the error?

  Unable to find image 'wave.stage-tower.net/wt/2e46a336f730/wave/build/stage:542bc2ddaeecfc5156ab07c0e1f6fcb2' locally
  docker: Error response from daemon: manifest for wave.stage-tower.net/wt/2e46a336f730/wave/build/stage:542bc2ddaeecfc5156ab07c0e1f6fcb2 not found: manifest unknown: repository '195996028523.dkr.ecr.eu-west-1.amazonaws.com/wave/build/stage:542bc2ddaeecfc5156ab07c0e1f6fcb2' not found.
  See 'docker run --help'.

@marcodelapierre
Copy link
Member Author

Is this the error?

  Unable to find image 'wave.stage-tower.net/wt/2e46a336f730/wave/build/stage:542bc2ddaeecfc5156ab07c0e1f6fcb2' locally
  docker: Error response from daemon: manifest for wave.stage-tower.net/wt/2e46a336f730/wave/build/stage:542bc2ddaeecfc5156ab07c0e1f6fcb2 not found: manifest unknown: repository '195996028523.dkr.ecr.eu-west-1.amazonaws.com/wave/build/stage:542bc2ddaeecfc5156ab07c0e1f6fcb2' not found.
  See 'docker run --help'.

No, I typically get one of these two:

Command error:
  Unable to find image 'wave.seqera.io/wt/455e45c14efa/wave/build:4a71f62a456a4231cb0f0640b406628e' locally
  docker: Error response from daemon: received unexpected HTTP status: 502 Bad Gateway.
Command error:
  Unable to find image 'wave.seqera.io/wt/c7e9f7bb170d/wave/build:593503d65400503e25f885163ecdc304' locally
  docker: Error response from daemon: missing or empty Content-Length header.
  See 'docker run --help'.

@pditommaso
Copy link
Member

Weird, cannot even find it in the logs. When it happen again let me know

@marcodelapierre
Copy link
Member Author

Ok, so:

  • implemented all your feedback from the past few days
  • sorted out issue with defaultFusionUrl
  • dropped ARM32

Tomorrow I will refactor the Dockerfile templates, and then release this PR for your final review and merging.

@pditommaso
Copy link
Member

Awesome!

@marcodelapierre marcodelapierre marked this pull request as ready for review May 3, 2023 07:09
@marcodelapierre
Copy link
Member Author

@pditommaso I am realising the PR for review and merge!

There are TWO outstanding issues, the first one being new:

  1. I have moved to Dockerfile templates, the code builds and the unit tests pass, but .. at runtime I get this error:
ERROR ~ Error executing process > 'sayHello (2)'

Caused by:
  Cannot invoke "java.net.URL.openConnection()" because "url" is null

e.g. with this trace:

java.lang.NullPointerException: Cannot invoke "java.net.URL.openConnection()" because "url" is null
        at org.codehaus.groovy.runtime.ResourceGroovyMethods.configuredInputStream(ResourceGroovyMethods.java:2199)
        at org.codehaus.groovy.runtime.ResourceGroovyMethods.newReader(ResourceGroovyMethods.java:2271)
        at io.seqera.wave.plugin.WaveClient.spackRecipeToDockerFile(WaveClient.groovy:528)

it must be something very silly that I am missing about the usage of these methods ... tips?

  1. the backend pull errors .. I will run a few other tests once the above is fixed. The dockerfile template makes it much quicker to prototype variations of the dockerfile, that can help troubleshooting this one, e.g. by altering the creation and caching of layers (number and size)

@pditommaso
Copy link
Member

have moved to Dockerfile templates, the code builds and the unit tests pass, but .. at runtime I get this error

I believe is happening because BashWrapperBuilder is in another module, try using

        final template = WaveClient.class.getResource('/dockerfile-spack-recipe.txt')
        try(final reader = template.newReader()) {
            final result = new BashTemplateEngine().render(reader, binding)
            return result
        }

The key is WaveClient.class.getResource, if it works it would also be better to move the files under the path templates/spack

  1. I'll look into once we merge this

@marcodelapierre
Copy link
Member Author

Yep it works, thank you!

About to push

Signed-off-by: Marco De La Pierre <[email protected]>
@marcodelapierre
Copy link
Member Author

marcodelapierre commented May 3, 2023

So the second-last man standing is:

the backend pull errors .. I will run a few other tests once the above is fixed. The dockerfile template makes it much quicker to prototype variations of the dockerfile, that can help troubleshooting this one, e.g. by altering the creation and caching of layers (number and size)

If you want to do some testing, add simple Dockerfile directives in this template: plugins/nf-wave/src/resources/dockerfile-spack-recipe.txt
And then run run main.nf -profile spackwave.

...And the last man is the setup of the binary cache, to render the functionality useful for regular production builds. We will chat about this in person in about a month!

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Marco! awesome job!

@pditommaso pditommaso merged commit b03cbe7 into nextflow-io:master May 3, 2023
@marcodelapierre
Copy link
Member Author

Thank you Paolo, let's talk at the next PR...!

@marcodelapierre
Copy link
Member Author

@tgamblin @alalazo @vsoch it is merged! :-)

the final bit will be setting up a binary cache of bioinformatics packages to render the service useful...I might ping you on this after my conference frenzy in May! Todd, Max see you at ISC!

abhi18av pushed a commit to abhi18av/nextflow that referenced this pull request Oct 28, 2023
This commit adds the ability to create containers on demand using 
Spack recipes via Wave container provisioning service. 

The Spack packages should be provided via the `spack` directive, 
then in the nextflow.config file, it should be added the following snippet 

```
wave.enabled = true 
wave.strategy = ['spack']
```

The Wave service will take care of provisioning the container image using the 
packages provided. 

Signed-off-by: Marco De La Pierre <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants