Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if flowcell id matches for paired samples #1664

Merged
merged 6 commits into from
Oct 30, 2024

Conversation

pmoris
Copy link
Contributor

@pmoris pmoris commented Sep 26, 2024

I noticed this comment about checking the flowcell ID for paired samples while constructing GATK read groups. I was adapting the read group code for a custom pipeline and attempted a quick fix, so I thought I'd contribute it back to sarek.

While constructing the read group from paired fastq samples, perform a check to ensure that the id is the same for (the first reads) in fastq_1 and fastq_2. Exit out with an error otherwise and report the problematic sample and file paths.

Incidentally, while researching read groups I came across the following recommendations: https://support.sentieon.com/appnotes/read_groups/. Would it be worth updating some of the fields to match these guidelines?

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • => Only tested this manually, but happy to add a proper test if you could give me a starting point. Is there already an existing test for samplesheet validation that I can add this too? I guess I will need to add "corrupt" fastq files to the nf-core test repo?
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
    • => will do this after submitting the PR so that I can link to it.
  • README.md is updated (including new tool citations and authors/contributors).
    • => should I do this even for such a minor contribution?

While constructing the read group from paired fastq samples,
perform a check to ensure that the id is the same for
(the first reads) in fastq_1 and fastq_2. Exit out with an
error otherwise and report the problematic sample and file paths.
Copy link

github-actions bot commented Sep 26, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 82615ad

+| ✅ 215 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-10-30 09:16:39

workflows/sarek/main.nf Outdated Show resolved Hide resolved
@maxulysse
Copy link
Member

@nf-core-bot fix linting 🙏 pretty please 🙏

@maxulysse
Copy link
Member

@pmoris I updated your PR with the latest update in this function.
No need to check for paired samples as sarek only handles paired samples

@maxulysse
Copy link
Member

Can you update the CHANGELOG

pmoris and others added 3 commits October 8, 2024 10:03
No need to check for paired samples as sarek only handles paired samples

Co-authored-by: Maxime U Garcia <[email protected]>
Check if flowcell id matches for read pair should throw error when flowcell id is found (instead of not found) AND it differs for the reads.
@pmoris
Copy link
Contributor Author

pmoris commented Oct 8, 2024

Changelog is updated!

I also fixed the conditional (by removing the meta.single_end check, it accidentally moved the negation to the flowcell variable, causing the check to not trigger at the right time).

Lastly, what are your thoughts on updating the PU field to flowcell.lane rather than just lane (as recommended here: https://support.sentieon.com/appnotes/read_groups/)?

@pmoris
Copy link
Contributor Author

pmoris commented Oct 10, 2024

Why is the linter complaining? There is no trailing whitespace or non-multiple-of-4 padding as far as I can tell...

@maxulysse maxulysse enabled auto-merge (squash) October 30, 2024 09:20
@maxulysse maxulysse disabled auto-merge October 30, 2024 10:27
@maxulysse maxulysse merged commit 74db9d3 into nf-core:dev Oct 30, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants