Skip to content

Protocol for detecting Gubaphage genomes among a set of predicted viral sequences

License

Notifications You must be signed in to change notification settings

alexmsalmeida/gubascreen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GubaScreen - Detect Gubaphage in viral genomes

This repo contains instructions on how to screen for the presence of Gubaphage lineages in a set of predicted viral sequences (nucleotide FASTA file). For more information about the Gubaphage, see Camarillo-Guerrero et al. Cell 2021 for a formal description of this clade.

Background

The Gubaphage is a recently discovered clade of bacteriophages highly prevalent in the gut microbiome of diverse human populations. Understanding its global distribution is important to uncover its potential role(s) in the gut ecosystem.

To perform a targeted detection of the Gubaphage, I performed a pan-genome analysis of all known Gubaphage genomes retrieved from Camarillo-Guerrero et al. Cell 2021, leading to the identification of a set of 6 core genes present in >90% of the genomes. For each core gene, HMMER was used to determine the optimal alignment bitscores (maximum F1 score, calculated with scripts/hmm-thresholds.R) that would enable a clear discrimination between Gubaphage and non-Gubaphage sequences. The resulting HMM models alongside their scores can be found in hmm_models/.

Installation

  1. Install the following dependencies:
  1. Clone the repo.
git clone https://github.com/alexmsalmeida/gubascreen.git
  1. Add the scripts/ directory to your $PATH environmental variable.

How to run

  1. Predict protein sequences from your nucleotide FASTA file (input.fa).
prodigal -i input.fa -a proteins.faa -p meta 
  1. Run HMMER to detect the presence of Gubaphage marker genes using pre-defined thresholds.
hmmsearch --cpu {threads} --cut_ga --tblout guba_hmmer.tsv --noali hmm_models/guba_core.hmm proteins.faa
  1. Build phylogenetic tree from HMMER output (iqtree can be replaced with fasttree for a faster, albeit less accurate analysis).
hmmer2tree.sh -t {threads} -i guba_hmmer.tsv -p proteins.faa -m iqtree -o phylo_tree

The main output file is a phylogenetic (phylo_tree/concat_alignment.aln.treefile) file containing your input sequences and reference Gubaphage genomes.

About

Protocol for detecting Gubaphage genomes among a set of predicted viral sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published