-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job arrays #3892
Job arrays #3892
Conversation
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
If a task fails and is retried with increased resources, it will be batched with other tasks that may still be on their first attempt. In that case, the array job resources will depend on whichever tasks happens to be first in the batch. One solution is to take the max value of cpus, memory, time, etc for all tasks in an array job. That would be "safe" but likely much more expensive -- if a single task requests twice the resources, suddenly the entire array job does as well. Another solution is to further separate batches by configuration, to ensure that they are uniform. We could go crazy and separate batches by the tuple of |
We could also just provide config options for these things:
|
The point of these config options is that there is a trade-off between bandwidth and latency when batching tasks like this, so users should ideally have the ability to manage that trade-off in a way that best fits their use case. If someone doesn't use retry with dynamic resources, then they don't need to group by attempt, and vise versa. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
modules/nextflow/src/main/groovy/nextflow/executor/ArrayExecutor.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <[email protected]>
plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchExecutor.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchExecutor.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Show resolved
Hide resolved
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
umm, this still shows
|
Signed-off-by: Ben Sherman <[email protected]>
Yeah I didn't think the volatile would help. Really there is no point in checking if the current thread is interrupted... |
That's the recommended pattern by Java Java Concurrency in Practice bible |
This reverts commit 247b721.
Signed-off-by: Paolo Di Tommaso <[email protected]>
@bentsherman all solved with Google Batch logs and child ids? |
Google Batch logs are working Why would you check if the thread you are currently in is interrupted? If it's interrupted then you wouldn't get the chance to check if it's interrupted, you would just be interrupted... |
Anyhow, it is worth covering in a separate PR |
Should be solved with: |
think so |
Fascinating the dir content when using job array (only for nerds :))
|
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Reverted some renaming on launch <> submit methods because still there were some inconsistencies and to prevent breaking the xpack deps |
Think we are finally ready to merge this. Great effort 👏 👏 |
Job array is a capability provided by some batch schedulers that allows spawning the execution of multiple copies of the same job in an efficient manner. Nextflow allows the use of this capability by setting the process directive `array <some value>` that determines the (max) number of jobs in the array. For example ``` process foo { array 10 ''' your_task ''' } ``` or in the nextflow config file ``` process.array = 10 ``` Currently this feature is supported by the following executors: * Slurm * Sge * Pbs * Pbs Pro * LSF * AWS Batch * Google Batch Signed-off-by: Ben Sherman <[email protected]> Signed-off-by: Paolo Di Tommaso <[email protected]> Signed-off-by: Mahesh Binzer-Panchal <[email protected]> Signed-off-by: Herman Singh <[email protected]> Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]> Co-authored-by: Paolo Di Tommaso <[email protected]> Co-authored-by: Abhinav Sharma <[email protected]> Co-authored-by: Mahesh Binzer-Panchal <[email protected]> Co-authored-by: Herman Singh <[email protected]> Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
Closes #1477 (and possibly #1427)
Summary of changes:
Adds
array
directive to submit tasks as array jobs of a given sizeTaskArrayCollector
collects tasks into arrays and submits each array job when it is ready to the underlying executor. The executor must implement theTaskArrayAware
interface. Each process has its own array job collector.When all input channels to a process have received the "poison pill", the process is "closed" and the array job collector is notified so that it can submit any remaining tasks. All subsequent tasks (e.g. retries) will be submitted as individual tasks.
TaskArray
is a special type ofTaskRun
for an array job that holds the list of child task handlers. For an executor that supports array jobs, the task handler can check whether its task is aTaskArray
to apply perform job specific behavior.TaskHandler
has a few more methods, which the array job collector uses to create the array job script. This script simply defines the list of child work directories, selects a work dir based on the index, and launches the child task using an executor-specific launch command.TaskPollingMonitor
has been modified to handle both array jobs and regular tasks. The array job is handled like any other task, but then discarded once it has been submitted. The task stats reported by Nextflow are the same with or without array jobs -- array jobs are not included in the task stats.Here's the pipeline I'm using as the e2e test:
TODO: