You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When analysing a sample that has been sequenced on multiple lanes, if there is a large discrepancy between samples the grouping key will be waiting for n lanes * n split reads for a sample, which may never occur.
Here's an example:
sample1, lane 1: 10 million reads
sample1, lane 2: 1 million reads
split into 1 million read chunks by FASTP
the groupKey here calculates there should be 2 * 10 chunks for a total of 20
Only 12 ever appear, so the sample is dropped out of the pipeline.
The pipeline green ticks because nothing 'fails'
I've attached a samplesheet using the test data from profile test and test_full which recreates the problem.
Setting splitFastq to size 0 is a workaround for this problem.
Changes:
- The grouping strategy for sharded data has been improved
- The number of BAM files per sample is calculated by grouping the sample by ID after splitting the FASTQ files, then counting the total number of FASTQ files created.
- This has to wait for all FASTQ files to be produced by FASTP, but is more reliable.
- After alignment, the number of FASTQ files is used to wait to determine the expected number of BAM files used by groupBy.
Fixes#1357
Description of the bug
When analysing a sample that has been sequenced on multiple lanes, if there is a large discrepancy between samples the grouping key will be waiting for
n lanes * n split reads
for a sample, which may never occur.Here's an example:
sample1, lane 1: 10 million reads
sample1, lane 2: 1 million reads
split into 1 million read chunks by FASTP
the groupKey here calculates there should be 2 * 10 chunks for a total of 20
Only 12 ever appear, so the sample is dropped out of the pipeline.
The pipeline green ticks because nothing 'fails'
I've attached a samplesheet using the test data from profile
test
andtest_full
which recreates the problem.Setting splitFastq to size 0 is a workaround for this problem.
Command used and terminal output
nextflow run nf-core/sarek -r v3.4.0 --splitFastq 5000000 --input sarek-uneven-size-error.csv --outdir results/
Relevant files
sarek-uneven-size-error.csv
System information
All
The text was updated successfully, but these errors were encountered: