Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GenomicsDB bug with mismatched intervals, remove duplicated variants from VQSR vcfs, add VQSR CI test #1173

Merged
merged 27 commits into from
Aug 16, 2023

Conversation

FriederikeHanssen
Copy link
Contributor

@FriederikeHanssen FriederikeHanssen commented Aug 5, 2023

List of changes:

  1. Adding at least stub test for VQSR (see Pytest for VQSR-flow #1027 ), attempts to get some actual teeny tiny data to run through failed

  2. Removing duplicate entries in joint germline VQSR vcf by following instructions described here fixing Joint Genotype Calling Recalibrated VCFs Duplicate Entries; VEP is (Intentionally) Eating Variants  #966 For joint variant calling, produce a final VCF file with filtered variants (those that PASS filters) #1102. opting for this approach for now as it reduces the number of computational steps that need to be done.

  3. Group on actual interval files and add relevant meta information to fix occasional grouping mismatches (GenomicsDBImport : Mismatching intervals for input-vcf-files and bed-interval-file #1137)

  4. some channel renaming (Rename channels genotype_intervals #1042)

  5. Refactor single sample filtering (Refactor joint germline calling #1053)

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Aug 5, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 89163ce

+| ✅ 151 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.9
  • Run at 2023-08-16 15:21:48

@FriederikeHanssen FriederikeHanssen marked this pull request as ready for review August 5, 2023 16:52
@asp8200
Copy link
Contributor

asp8200 commented Aug 5, 2023

Should some of these improvements perhaps be "mirrored" or "copied" over into the sentieon-subworkflows?

@FriederikeHanssen
Copy link
Contributor Author

Should some of these improvements perhaps be "mirrored" or "copied" over into the sentieon-subworkflows?

sure, but in a separate PR imo. and not all is relevant due to the different structure

@FriederikeHanssen
Copy link
Contributor Author

Tests are failing because of the generated csv file. We should decide which vcf files go in there and by extension get annotated:

This is the current state for the joint_germline track. No gvcfs, but the output of genotypegvcf and then the one following recalibration.

patient,sample,variantcaller,vcf
all_samples,joint_variant_calling,haplotypecaller,results/variant_calling/haplotypecaller/joint_variant_calling/joint_germline.vcf.gz
all_samples,recalibrated_joint_variant_calling,haplotypecaller,results/variant_calling/haplotypecaller/recalibrated_joint_variant_calling/joint_germline_recalibrated.vcf.gz

@maxulysse
Copy link
Member

code is looking good, still I think we need to fix some tests

@adamrtalbot
Copy link
Contributor

Agree with @maxulysse, code is an improvement. Anything we can help with re: tests?

@FriederikeHanssen
Copy link
Contributor Author

fixing the samplesheet stuff at the moment. Should fix the tests, plus I added a sneaky docker tag in one one. Let me push things and get back to you on the tests

@FriederikeHanssen FriederikeHanssen changed the title Refactor Haplotypecaller Fix GenomicsDB bug with mismatched intervals, remove duplicated variants from VQSR vcfs, add VQSR CI test Aug 16, 2023
@FriederikeHanssen
Copy link
Contributor Author

@nf-core-bot fix linting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants