Skip to content

2). Overview

Duncan Berger edited this page Nov 11, 2024 · 5 revisions

Table of contents

Simplified schematic overview

docs_images_loma_schematic_simplified

Description

The pipeline will perform the following steps:

1). Read quality control

2). Read-based taxonomic annotation

3). Host read removal

  • Identify host-contaminant reads using Kraken2. and/or Minimap2 against a host reference or genome, respectively.
  • Remove host-contaminant reads using SeqKit.
  • Assess post-QC read quality using NanoPlot and Seqtk.
  • Merge pre- and post-read QC metrics into a summary report.

4). Assembly

  • Assembly post-QC host-decontaminated reads using Flye.
  • Map reads to the full metagenome assembly using Minimap2 and polish (contig error correction) with Racon (0-4 rounds of correction).
  • Polish full metagenome assembly using Medaka.

5). Contig analysis

  • Identify viral, proviral and plasmid sequences in across all contigs using geNomad.
  • Identify closest taxonomic hits to each contig using skani.
  • Calculate contig length and GC content using SeqKit.
  • Calculate read depth (coverage) using Samtools.

6). Assembly

  • Classify contigs into classes: archaea, bacteria, prokarya, eukarya, organelle (mitochondria, plastid) or unknown using Tiara.
  • Perform metagenomic binning, separate out contigs into individual metagenome assembled genomes (MAGs), using both SemiBin2 and MetaDecoder.
  • Generate consensus bins from SemiBin2 and MetaDecoder and outputs using DAS Tool

7). Bin quality control

  • Assess the quality of genome bins using CheckM.
  • Assign taxonomic classifications to each MAG using GTDB-Tk.
  • Assess the quality of genome bins using CheckM.
  • Calculate assembly statistics using assembly_stats.py.
  • Merge bin QC and contig QC results into summary reports.

8). Typing

Subset MAGs of interest (target species) using filter_gtdbtk.py and dependent on taxonomic classification pass them on to individual subworkflows (run per-MAG).

8a). Bacteria
  • Identify virulence genes using VirulenceFinder.
  • Assign sequence type to MAGs using MLST.
  • Assign sequence type to MAG specific readsets using Krocus.
  • Screen for genes of interest using BLASTN.
  • Screen for genes of interest using Genefinder.
  • Identify plasmids using PlasmidFinder.
8b). Listeria monocytogenes
  • Perform in silico serogroup typing prediction using LisSero.
8c). Salmonella
  • Perform in silico Salmonella serotyping using SeqSero2.
  • Predict serovar, antigen gene and cgMLST alleles using SISTR.
  • Identify antimicrobial resistance genes and lineages of S. Typhi and S. Paratyphi B using Mykrobe.
8d). Escherichia coli / Shigella spp.
  • Differentiate Shigella/Enteroinvasive Escherichia coli and identify serotype using ShigEIFinder.
  • Determine Shigella serotype using ShigaTyper.
  • Determine E. coli serotype using ECTyper.
  • Identify serotype of Shigatoxin producing E. coli using STECFinder.
  • Identify antimicrobial resistance genes and lineages of S. sonnei using Mykrobe.

9). Antimicrobial resistance

  • Identify AMR genes (incl. point mutations) and virulence/stress resistance genes using AMRFinderPlus.
  • Identify AMR and virulence resistance genes across multiple databases using ABRicate.
  • Identify AMR genes (incl.point mutations) using RGI.
  • Identify AMR genes using ResFinder.
  • Identify AMR conferring point mutations using PointFinder.
  • Merge the AMR results into a summary report. Finally, all major results are merged into a single summary report, which include the per-bin typing information produce in step 8).