Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupKey Logic for multiple lanes with different sizes based on line splitting #853

Closed
achristofferson-bbi opened this issue Nov 17, 2022 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@achristofferson-bbi
Copy link

Will this logic work for samples that have been run across multiple lanes and split_fastq turned on?
If BAM_MARKDUPLICATES_SPARK is the next step and 1 lane of fastqs was split into 2 and the other was split into 3 then BAM_MARKDUPLICATES_SPARK won't run at all or it will only run with 4 out of the 5 aligned bams.

Am I missing something?

sarek/workflows/sarek.nf

Lines 500 to 522 in 0b7f0e2

ch_bam_mapped = FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP.out.bam.map{ meta, bam ->
numLanes = meta.numLanes ?: 1
size = meta.size ?: 1
// update ID to be based on the sample name
// update data_type
// remove no longer necessary fields:
// read_group: Now in the BAM header
// numLanes: Was only needed for mapping
// size: Was only needed for mapping
new_meta = [
id:meta.sample,
data_type:"bam",
patient:meta.patient,
sample:meta.sample,
sex:meta.sex,
status:meta.status,
]
// Use groupKey to make sure that the correct group can advance as soon as it is complete
// and not stall the workflow until all reads from all channels are mapped
[ groupKey(new_meta, numLanes * size), bam]
}.groupTuple()

@maxulysse
Copy link
Member

@achristofferson-bbi we reworked that part now, hopefully it should be clearer now

@FriederikeHanssen
Copy link
Contributor

re-opening this issue. as discussed on slack the updates in #896 don't actually address this.

I believe the fix would be to add instead of multiply, but I am not quite sure yet how we can do this from a technical perspective, since we won't have access to all meta.maps in the channel mapping in parallel...... One workaround could be to force the user to supply total number of lane (or compute it on samplesheet parsing), then at least we are ride of one variable part. However, even then I don't see a straight forward way right now to make sure to group them properly without causing any form of bottleneck, which here would not be acceptable.

@FriederikeHanssen
Copy link
Contributor

🦆 moment: maybe multiple levels of grouping can get us there

@maxulysse maxulysse added this to the 3.2 milestone Feb 21, 2023
@FriederikeHanssen FriederikeHanssen added the bug Something isn't working label Mar 21, 2023
@github-project-automation github-project-automation bot moved this to Todo (Hackathon General) in nf-core Hackathon March 2023 Mar 21, 2023
@maxulysse
Copy link
Member

@FriederikeHanssen do you think this is now finally fixed?

@FriederikeHanssen
Copy link
Contributor

Fixed in th elinked PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

3 participants