-
Notifications
You must be signed in to change notification settings - Fork 0
2). Overview
Duncan Berger edited this page Nov 11, 2024
·
5 revisions
The pipeline will perform the following steps:
- Assess pre-QC read quality using
NanoPlot
andSeqtk
. - Remove adaptors from reads using
Porechop
. - Filter long reads by quality using
Filtlong
.
- Read-based taxonomic classification using
Kraken2
. - Read-based taxonomic classification using
Centrifuger
. - Bayesian Reestimation of Abundance of
Kraken2
andCentrifuger
outputs usingBracken
. - Standardise taxonomic profiles output by
Kraken2
,Centrifuger
andBracken
usingtaxpasta
. - Read-based taxonomic classification using
Sylph
. - Merge taxonomic classification results into summary reports.
- Identify host-contaminant reads using
Kraken2
. and/orMinimap2
against a host reference or genome, respectively. - Remove host-contaminant reads using
SeqKit
. - Assess post-QC read quality using
NanoPlot
andSeqtk
. - Merge pre- and post-read QC metrics into a summary report.
- Assembly post-QC host-decontaminated reads using
Flye
. - Map reads to the full metagenome assembly using
Minimap2
and polish (contig error correction) withRacon
(0-4 rounds of correction). - Polish full metagenome assembly using
Medaka
.
- Identify viral, proviral and plasmid sequences in across all contigs using
geNomad
. - Identify closest taxonomic hits to each contig using
skani
. - Calculate contig length and GC content using
SeqKit
. - Calculate read depth (coverage) using
Samtools
.
- Classify contigs into classes: archaea, bacteria, prokarya, eukarya, organelle (mitochondria, plastid) or unknown using
Tiara
. - Perform metagenomic binning, separate out contigs into individual metagenome assembled genomes (MAGs), using both
SemiBin2
andMetaDecoder
. - Generate consensus bins from
SemiBin2
andMetaDecoder
and outputs usingDAS Tool
- Assess the quality of genome bins using
CheckM
. - Assign taxonomic classifications to each MAG using
GTDB-Tk
. - Assess the quality of genome bins using
CheckM
. - Calculate assembly statistics using
assembly_stats.py
. - Merge bin QC and contig QC results into summary reports.
Subset MAGs of interest (target species) using filter_gtdbtk.py
and dependent on taxonomic classification pass them on to individual subworkflows (run per-MAG).
- Identify virulence genes using
VirulenceFinder
. - Assign sequence type to MAGs using
MLST
. - Assign sequence type to MAG specific readsets using
Krocus
. - Screen for genes of interest using
BLASTN
. - Screen for genes of interest using
Genefinder
. - Identify plasmids using
PlasmidFinder
.
- Perform in silico serogroup typing prediction using
LisSero
.
- Perform in silico Salmonella serotyping using
SeqSero2
. - Predict serovar, antigen gene and cgMLST alleles using
SISTR
. - Identify antimicrobial resistance genes and lineages of S. Typhi and S. Paratyphi B using
Mykrobe
.
- Differentiate Shigella/Enteroinvasive Escherichia coli and identify serotype using
ShigEIFinder
. - Determine Shigella serotype using
ShigaTyper
. - Determine E. coli serotype using
ECTyper
. - Identify serotype of Shigatoxin producing E. coli using
STECFinder
. - Identify antimicrobial resistance genes and lineages of S. sonnei using
Mykrobe
.
- Identify AMR genes (incl. point mutations) and virulence/stress resistance genes using
AMRFinderPlus
. - Identify AMR and virulence resistance genes across multiple databases using
ABRicate
. - Identify AMR genes (incl.point mutations) using
RGI
. - Identify AMR genes using
ResFinder
. - Identify AMR conferring point mutations using
PointFinder
. - Merge the AMR results into a summary report. Finally, all major results are merged into a single summary report, which include the per-bin typing information produce in step 8).