Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UMI reads processing capability #145

Merged
merged 26 commits into from
Jul 22, 2020
Merged

Add UMI reads processing capability #145

merged 26 commits into from
Jul 22, 2020

Conversation

lescai
Copy link
Contributor

@lescai lescai commented Mar 5, 2020

nf-core/sarek pull request

Many thanks for contributing to nf-core/sarek!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

PR checklist

  • This pull request introduces a chunk of code to process reads containing UMIs. Unique Molecular Indices are very important particularly for somatic workflows aiming at detecting very low allele-fraction variants (MRD, Liquid Biopsy). The chosen workflow adopts the FGBIO tools, which create a consensus read within the same UMI-groups, and a robust method for identification of the groups. See blog and ref.
    The approach ensures downstream compatibility with the workflow: the result of the UMI process is a uBam, which can then be fed into MappingReads and downstream in both HaplotypeCaller and more importantly Mutect2 or Strelka.

  • Tests are work in progress: datasets have been identified from 2 different UMI types (QIAseq and Illumina TSO), but cannot complete them on laptop

  • As indicated above, the reads will be uploaded at nf-core/sarek branch on the nf-core/test-datasets repo

  • The code has passed lints (nf-core lint .).

  • Documentation in docs has been updated

  • CHANGELOG.md is not been updated yet

  • README.md has not been updated yet (not sure if this is relevant)

@lescai lescai requested a review from maxulysse as a code owner March 5, 2020 11:42
main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing.
We just need the test data to do some CI.
Can you update the CHANGELOG as well?

docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
@maxulysse
Copy link
Member

maxulysse commented Mar 5, 2020

Made a couple of suggestions, if you accept them, you can batch commit them.

lescai and others added 3 commits March 5, 2020 12:52
Co-Authored-By: Maxime Garcia <[email protected]>
Co-Authored-By: Maxime Garcia <[email protected]>
main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@lescai lescai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with suggestions, reviewed and made more explicit

@maxulysse maxulysse added this to the 3.0 milestone Mar 6, 2020
@maxulysse maxulysse mentioned this pull request Mar 6, 2020
@maxulysse maxulysse changed the title Added UMI reads processing capability Add UMI reads processing capability May 18, 2020
@maxulysse maxulysse added enhancement New feature or request and removed work in progress labels May 18, 2020
@chelauk
Copy link
Contributor

chelauk commented Jul 20, 2020

Hi any updates on adding umi to variant calling? Is it working? Otherwise I will build a new pipeline.

@lescai
Copy link
Contributor Author

lescai commented Jul 20, 2020

Hi @chelauk not sure what's holding the pull request at this stage, I did test everything at the March hackathon using the test data here
https://github.com/nibscles/test-datasets/tree/sarek-umi
And it's working.
We are using our standalone code at NIBSC, if you don't want to build something new: it's undocumented, but it will give you an unmapped BAM, which you can then feed into Sarek if you like.
https://github.com/nibscbioinformatics/core/tree/master/workflows/umi
The important thing is to pass the correct UMI structure (see documentation for this pull request)
https://github.com/nibscles/sarek/blob/umi/docs/usage.md#--umi

@maxulysse
Copy link
Member

Hi @chelauk @nibscles
I wanted to close this one during the hackathon, but did not find the time.
I'll make sure merge it by the end of the week.
We're planning a minor release soonish as well

main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
@maxulysse maxulysse merged commit f28a546 into nf-core:dev Jul 22, 2020
@maxulysse
Copy link
Member

@chelauk @nibscles
PR is finally merged.
I'll be making another PR regarding UMI, some tiny code polishing and adding test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants