dockstore_workflow_snps

This repository contains CWL files needed to run four SNP callers in a single workflow, named snp_callers_workflow.cwl
This workflow is available on Dockstore as dockstore_workflow_snps

Inputs

Input to the workflow is a JSON format file (see example.json) with paths to the following:

A genome in fasta format with a samtools index (.fai) and a GATK .dict file (see below) in the same directory
A tumor sample in bam format with a samtools index (.bai) in the same directory
A normal sample from the same patient in bam format with a samtools index (.bai) in the same directory
A bed format file with the centromere locations of the genome. hg38.centromere.bed contains centromeres for hg38/GRCh38
A Cosmic vcf format file with known cancer mutations, with a tabix index (.tbi), see below
A dbSNP vcf format file, with a tabix index. See for example ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/common_all_20170710.vcf.gz
The outputfile, which will be in .tgz format

.dict

To create a .dict file, install picard-tools and run

java -jar picard.jar CreateSequenceDictionary REFERENCE=<my_genome>.fa OUTPUT=<my_genome>.dict

Note that while .fai and .bai extensions are appended to the original filename (normal.bam.bai), the .dict extension replaces the .fa extension.
Warning make sure you do not have other periods in the genome filename, the workflow currently cannot find the .dict file if you do.

.tbi

To create .tbi files, first use bgzip to compress your file (you may have to gunzip first), then run

tabix -p vcf cosmic.vcf.gz

Note that you can download a .tbi file directly from the NCBI ftp site for the dbSNP vcf file.

Outputs

Output will be tarred, gzipped, and copied to the path you listed in your JSON file. It will unpack into the following files: muse.filtered.vcf
mutect.vcf
somatic_sniper.vcf
pindel.vcf

Run

Providing you have dockstore and docker installed on your system, run

dockstore workflow launch --entry github.com/BD2KGenomics/dockstore_workflow_snps:master --json my.json

NOTE: This may take more than a day. Use as many processors as possible to speed up the run, and avoid having unnecessary sequences in your genome fasta (chrUn, chr_random).

Details

The workflow calls docker containers maintained by opengenomics and hosted on Dockerhub.
More information about the individual tools can be found by clicking on these links:
MuSE
MuTect
SomaticSniper
Pindel

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
example.json		example.json
hg38.centromere.bed		hg38.centromere.bed
muse.cwl		muse.cwl
muse_filter.cwl		muse_filter.cwl
mutect.cwl		mutect.cwl
pindel.cwl		pindel.cwl
snp_callers_workflow.cwl		snp_callers_workflow.cwl
somatic_sniper.cwl		somatic_sniper.cwl
tar.cwl		tar.cwl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dockstore_workflow_snps

Inputs

.dict

.tbi

Outputs

Run

Details

About

Releases

Packages

Languages

BD2KGenomics/dockstore_workflow_snps

Folders and files

Latest commit

History

Repository files navigation

dockstore_workflow_snps

Inputs

.dict

.tbi

Outputs

Run

Details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages