Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support resource labels for Azure Batch #4178

Merged
merged 4 commits into from
Aug 17, 2023

Conversation

bentsherman
Copy link
Member

Close #3248

Based on discussions in the issue and with Adam, it seems cost analysis in Azure is tied to pools, and job metadata is not propagated to VMs. As a result, this PR applies resourceLabels to the pool when it is created, and the docs make clear that resource labels in Azure should be defined once for all processes, or else they will be non-deterministic.

Happy to be proven wrong if anyone knows better. But this feature is still useful for pipeline-level labels, like pipeline name, user, project, etc.

@adamrtalbot can you test it?

@netlify
Copy link

netlify bot commented Aug 11, 2023

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit ca8441f
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/64dbb576dfe7eb0008f5791a
😎 Deploy Preview https://deploy-preview-4178--nextflow-docs-staging.netlify.app/process
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

…rvice.groovy [ci fast]

Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso
Copy link
Member

This test could be updated

@adamrtalbot
Copy link
Collaborator

With config item:

process.resourceLabels = [user: "Adam Talbot", purpose: "test-resource-labels"]

Seems to work:
image

The metadata isn't applied to the job itself, but we expected this:
image

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Aug 15, 2023

Even better, I changed the config to this:

process.resourceLabels = { [ user: "Adam Talbot", purpose: "test-resource-labels", task: task.process ] }

And now I created a pool for every process in Nextflow, each one with metadata labels for each process:
image

image

This means you could tag everything for proper cost evaluation based on tasks.

@bentsherman
Copy link
Member Author

@adamrtalbot what do you think about creating different auto pools for different resource labels? This is how it currently works, if two processes have different resource labels, Nextflow will create two different pools for them.

This would allow the resource labels to be process-specific, but could potentially create lots of otherwise identical pools. I'm wondering if you think it's worth having.

@bentsherman
Copy link
Member Author

My concern is that it could lead to lower resource utilization, since each process would have its own pool rather than many processes sharing one or a few pools, which are more likely to be fully utilized the whole time.

@adamrtalbot
Copy link
Collaborator

@adamrtalbot what do you think about creating different auto pools for different resource labels? This is how it currently works, if two processes have different resource labels, Nextflow will create two different pools for them.

Seems perfect.

I will try grouping next, so half the processes go to one pool, half to another, using resourceLabels.

Leaving this setting on was a mistake, because now I have to clear up my mess manually:

deletePoolsOnCompletion   = false

@adamrtalbot
Copy link
Collaborator

With new config (pipeline is https://github.com/adamrtalbot/nf-canary/):

process {
    withName: 'TEST_CREATE_FOLDER|TEST_PUBLISH_FILE|TEST_IGNORED_FAIL|TEST_PUBLISH_FOLDER|TEST_CREATE_FILE' {
        resourceLabels = [ user: "Adam Talbot", purpose: "test-resource-labels", group: "A" ]
    }
    withName: 'TEST_SUCCESS|TEST_BIN_SCRIPT|TEST_INPUT|TEST_PASS_FILE|TEST_PASS_FOLDER' {
        resourceLabels = [ user: "Adam Talbot", purpose: "test-resource-labels", group: "B" ]
    }
}

image

image

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman bentsherman requested a review from pditommaso August 15, 2023 17:20
@adamrtalbot
Copy link
Collaborator

To expand on why it's perfect, you can effectively use resourceLabels as cost tags now for filtering and management. Let Nextflow handle all the Azure Batch pool management stuff for you and you can check how much you're spending (e.g. per process).

@bentsherman
Copy link
Member Author

Wonderful, then this PR is ready to go

@pditommaso pditommaso merged commit 7b5e50a into master Aug 17, 2023
@pditommaso pditommaso deleted the 3248-azure-batch-tags-propagation branch August 17, 2023 14:23
pditommaso added a commit that referenced this pull request Aug 23, 2023
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
abhi18av pushed a commit to abhi18av/nextflow that referenced this pull request Oct 28, 2023
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure metadata tags (resource labels) propagation for child jobs
3 participants