Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified version of Hello-GATK #376

Merged
merged 12 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
430 changes: 191 additions & 239 deletions docs/hello_nextflow/03_hello_gatk.md
vdauwera marked this conversation as resolved.
Show resolved Hide resolved

Large diffs are not rendered by default.

Binary file modified hello-nextflow/data/bam/reads_father.bam
Binary file not shown.
Binary file modified hello-nextflow/data/bam/reads_mother.bam
Binary file not shown.
Binary file modified hello-nextflow/data/bam/reads_son.bam
Binary file not shown.
4 changes: 0 additions & 4 deletions hello-nextflow/data/intervals.list

This file was deleted.

Binary file removed hello-nextflow/data/ref.tar.gz
Binary file not shown.
3 changes: 3 additions & 0 deletions hello-nextflow/data/ref/intervals.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
20_10037292_10066351 3276 5495
20_10037292_10066351 7535 9859
20_10037292_10066351 12911 14737
2 changes: 2 additions & 0 deletions hello-nextflow/data/ref/ref.dict
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
@HD VN:1.6
@SQ SN:20_10037292_10066351 LN:29059 M5:d9a7bb8816cea7d1e5c55e00d1faeeb4 UR:reads_father_20_10037292_10066351.region.fasta
2 changes: 2 additions & 0 deletions hello-nextflow/data/ref/ref.fasta

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions hello-nextflow/data/ref/ref.fasta.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
20_10037292_10066351 29059 22 29059 29060
4 changes: 0 additions & 4 deletions hello-nextflow/data/samplesheet.csv

This file was deleted.

6 changes: 4 additions & 2 deletions hello-nextflow/hello-gatk.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
* Pipeline parameters
*/

// Execution environment setup

// Primary input

/*
Expand All @@ -13,18 +11,22 @@ process SAMTOOLS_INDEX {

container

publishDir

input:

output:

"""

"""

}

workflow {

// Create input channel

// Create index file for input BAM file

}
17 changes: 8 additions & 9 deletions hello-nextflow/scripts/hello-gatk-1.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
* Pipeline parameters
*/

// Execution environment setup
params.projectDir = "/workspace/gitpod/hello-nextflow"
$projectDir = params.projectDir

// Primary input
params.reads_bam = "${projectDir}/data/bam/reads_mother.bam"

Expand All @@ -14,7 +10,9 @@ params.reads_bam = "${projectDir}/data/bam/reads_mother.bam"
*/
process SAMTOOLS_INDEX {

container 'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1'
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results', mode: 'copy'

input:
path input_bam
Expand All @@ -24,15 +22,16 @@ process SAMTOOLS_INDEX {

"""
samtools index '$input_bam'

"""
}


workflow {

// Create input channel
reads_ch = Channel.of(params.reads_bam)
// Create input channel (single file via CLI parameter)
reads_ch = Channel.fromPath(params.reads_bam)

// Create index file for input BAM file
SAMTOOLS_INDEX(reads_ch)
}

}
43 changes: 24 additions & 19 deletions hello-nextflow/scripts/hello-gatk-2.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,23 @@
* Pipeline parameters
*/

// Execution environment setup
params.projectDir = "/workspace/gitpod/hello-nextflow"
$projectDir = params.projectDir

// Primary input
params.reads_bam = "${projectDir}/data/bam/reads_mother.bam"

// Accessory files
params.genome_reference = "${projectDir}/data/ref/ref.fasta"
params.genome_reference_index = "${projectDir}/data/ref/ref.fasta.fai"
params.genome_reference_dict = "${projectDir}/data/ref/ref.dict"
params.calling_intervals = "${projectDir}/data/intervals.list"
params.reference = "${workflow.projectDir}/data/ref/ref.fasta"
params.reference_index = "${workflow.projectDir}/data/ref/ref.fasta.fai"
params.reference_dict = "${workflow.projectDir}/data/ref/ref.dict"
params.intervals = "${workflow.projectDir}/data/ref/intervals.bed"

/*
* Generate BAM index file
*/
process SAMTOOLS_INDEX {

container 'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1'
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results', mode: 'copy'

input:
path input_bam
Expand All @@ -30,16 +28,17 @@ process SAMTOOLS_INDEX {

"""
samtools index '$input_bam'

"""
}

/*
* Call variants with GATK HapolotypeCaller in GVCF mode
* Call variants with GATK HaplotypeCaller in GVCF mode
*/
process GATK_HAPLOTYPECALLER {

container "docker.io/broadinstitute/gatk:4.5.0.0"
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results', mode: 'copy'

input:
path input_bam
Expand All @@ -65,8 +64,14 @@ process GATK_HAPLOTYPECALLER {

workflow {

// Create input channel
reads_ch = Channel.of(params.reads_bam)
// Create input channel (single file via CLI parameter)
reads_ch = Channel.fromPath(params.reads_bam)

// Create channels for the accessory files (reference and intervals)
ref_file = file(params.reference)
ref_index_file = file(params.reference_index)
ref_dict_file = file(params.reference_dict)
intervals_file = file(params.intervals)

// Create index file for input BAM file
SAMTOOLS_INDEX(reads_ch)
Expand All @@ -75,9 +80,9 @@ workflow {
GATK_HAPLOTYPECALLER(
reads_ch,
SAMTOOLS_INDEX.out,
params.genome_reference,
params.genome_reference_index,
params.genome_reference_dict,
params.calling_intervals
ref_file,
ref_index_file,
ref_dict_file,
intervals_file
)
}
}
53 changes: 30 additions & 23 deletions hello-nextflow/scripts/hello-gatk-3.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,27 @@
* Pipeline parameters
*/

// Execution environment setup
params.projectDir = "/workspace/gitpod/hello-nextflow"
$projectDir = params.projectDir

// Primary input
params.reads_bam = ["${projectDir}/data/bam/reads_mother.bam",
"${projectDir}/data/bam/reads_father.bam",
"${projectDir}/data/bam/reads_son.bam"]
// Primary input (list of three samples)
params.reads_bam = [
"${projectDir}/data/bam/reads_mother.bam",
"${projectDir}/data/bam/reads_father.bam",
"${projectDir}/data/bam/reads_son.bam"
]

// Accessory files
params.genome_reference = "${projectDir}/data/ref/ref.fasta"
params.genome_reference_index = "${projectDir}/data/ref/ref.fasta.fai"
params.genome_reference_dict = "${projectDir}/data/ref/ref.dict"
params.calling_intervals = "${projectDir}/data/intervals.list"
params.reference = "${projectDir}/data/ref/ref.fasta"
params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
params.reference_dict = "${projectDir}/data/ref/ref.dict"
params.intervals = "${projectDir}/data/ref/intervals.bed"

/*
* Generate BAM index file
*/
process SAMTOOLS_INDEX {

container 'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1'
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results', mode: 'copy'

input:
path input_bam
Expand All @@ -32,16 +32,17 @@ process SAMTOOLS_INDEX {

"""
samtools index '$input_bam'

"""
}

/*
* Call variants with GATK HapolotypeCaller in GVCF mode
* Call variants with GATK HaplotypeCaller in GVCF mode
*/
process GATK_HAPLOTYPECALLER {

container "docker.io/broadinstitute/gatk:4.5.0.0"
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results', mode: 'copy'

input:
tuple path(input_bam), path(input_bam_index)
Expand All @@ -66,18 +67,24 @@ process GATK_HAPLOTYPECALLER {

workflow {

// Create input channel
reads_ch = Channel.of(params.reads_bam)
// Create input channel (single file via CLI parameter)
reads_ch = Channel.fromPath(params.reads_bam)

// Create channels for the accessory files (reference and intervals)
ref_file = file(params.reference)
ref_index_file = file(params.reference_index)
ref_dict_file = file(params.reference_dict)
intervals_file = file(params.intervals)

// Create index file for input BAM file
SAMTOOLS_INDEX(reads_ch)

// Call variants from the indexed BAM file
GATK_HAPLOTYPECALLER(
SAMTOOLS_INDEX.out,
params.genome_reference,
params.genome_reference_index,
params.genome_reference_dict,
params.calling_intervals
ref_file,
ref_index_file,
ref_dict_file,
intervals_file
)
}
}
41 changes: 23 additions & 18 deletions hello-nextflow/scripts/hello-gatk-4.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,23 @@
* Pipeline parameters
*/

// Execution environment setup
params.projectDir = "/workspace/gitpod/hello-nextflow"
$projectDir = params.projectDir

// Primary input (list of input files, one per line)
params.reads_bam = "${projectDir}/data/sample_bams.txt"

// Accessory files
params.genome_reference = "${projectDir}/data/ref/ref.fasta"
params.genome_reference_index = "${projectDir}/data/ref/ref.fasta.fai"
params.genome_reference_dict = "${projectDir}/data/ref/ref.dict"
params.calling_intervals = "${projectDir}/data/intervals.list"
params.reference = "${projectDir}/data/ref/ref.fasta"
params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
params.reference_dict = "${projectDir}/data/ref/ref.dict"
params.intervals = "${projectDir}/data/ref/intervals.bed"

/*
* Generate BAM index file
*/
process SAMTOOLS_INDEX {

container 'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1'
container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'

publishDir 'results', mode: 'copy'

input:
path input_bam
Expand All @@ -30,16 +28,17 @@ process SAMTOOLS_INDEX {

"""
samtools index '$input_bam'

"""
}

/*
* Call variants with GATK HapolotypeCaller in GVCF mode
* Call variants with GATK HaplotypeCaller in GVCF mode
*/
process GATK_HAPLOTYPECALLER {

container "docker.io/broadinstitute/gatk:4.5.0.0"
container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"

publishDir 'results', mode: 'copy'

input:
tuple path(input_bam), path(input_bam_index)
Expand All @@ -64,18 +63,24 @@ process GATK_HAPLOTYPECALLER {

workflow {

// Create input channel from list of input files in plain text
// Create input channel from list of input files in plain text
reads_ch = Channel.fromPath(params.reads_bam).splitText()

// Create channels for the accessory files (reference and intervals)
ref_file = file(params.reference)
ref_index_file = file(params.reference_index)
ref_dict_file = file(params.reference_dict)
intervals_file = file(params.intervals)

// Create index file for input BAM file
SAMTOOLS_INDEX(reads_ch)

// Call variants from the indexed BAM file
GATK_HAPLOTYPECALLER(
SAMTOOLS_INDEX.out,
params.genome_reference,
params.genome_reference_index,
params.genome_reference_dict,
params.calling_intervals
ref_file,
ref_index_file,
ref_dict_file,
intervals_file
)
}
}
Loading
Loading