Skip to content

Tutorial

MrTomRod edited this page Dec 10, 2023 · 4 revisions

This tutorial shows how to generate Scoary-2-compatible orthogene data and how to run Scoary itself.

Prepare data and run OrthoFinder

# download protein FASTAs
wget https://cloud.bioinformatics.unibe.ch/index.php/s/ozbmq4AoDbWHWHA/download/scoary-2-tutorial.zip
unzip scoary-2-tutorial.zip && rm scoary-2-tutorial.zip

# run OrthoFinder (using docker? simply replace 'podman' with 'docker')
podman run -it -v .:/input:Z davidemms/orthofinder:2.5.4 \
  orthofinder -f /input/fastas

# Get the OrthoFinder subdir name as a variable
ORTHOFINDER_RUN=$(basename fastas/OrthoFinder/Results_*)  # e.g Results_Jan1

# create virtual environment
python3 -m venv ./venv
source venv/bin/activate  # command to leave the virtual env: "deactivate"
pip install -U pip setuptools wheel

Optional: Create a gene-info file

# install orthofinder-tools
pip install orthofinder-tools

annotate_orthogroups \
  --orthogroups_tsv fastas/OrthoFinder/$ORTHOFINDER_RUN/Phylogenetic_Hierarchical_Orthogroups/N0.tsv \
  --fasta_dir fastas \
  --file_endings faa \
  --hog True \
  --header True \
  --out N0_best_names.tsv

Run Scoary2

Option 1: Via Python

# install Scoary2
pip install scoary-2

# run Scoary2
scoary2 \
  --genes fastas/OrthoFinder/$ORTHOFINDER_RUN/Phylogenetic_Hierarchical_Orthogroups/N0.tsv \
  --gene-data-type 'gene-list:\t' \
  --gene-info N0_best_names.tsv \
  --traits traits.tsv \
  --trait-data-type 'binary:\t' \
  --trait-info trait_info.tsv \
  --isolate-info isolate_info.tsv \
  --n-permut 200 \
  --n-cpus 1 \
  --outdir out \
  --multiple_testing native:0.99  # force some output by setting a high p-value threshold

Option 2: Via podman/docker

# using docker? simply replace 'podman' with 'docker'
podman run --rm -v ./:/data:Z troder/scoary-2 \
  scoary2 \
    --genes /data/fastas/OrthoFinder/$ORTHOFINDER_RUN/Phylogenetic_Hierarchical_Orthogroups/N0.tsv \
    --gene-data-type 'gene-list:\t' \
    --gene-info /data/N0_best_names.tsv \
    --traits /data/traits.tsv \
    --trait-data-type 'binary:\t' \
    --trait-info /data/trait_info.tsv \
    --isolate-info /data/isolate_info.tsv \
    --n-permut 200 \
    --n-cpus 1 \
    --outdir /data/out \
    --multiple_testing native:0.99  # force some output by setting a high p-value threshold

Look at the results

Obviously, in this dummy dataset with only 4 isolates, no trait was significant, so there is nothing to look at.

Potential next steps:

  • Inspect the log file (out/logs/scoary-2.log)
  • Inspect the trait overview file (out/summary.tsv)
  • Inspect specific traits manually (out/traits/*)
  • Inspect the output via the app:
# run a simple server
python -m http.server --cgi 8080
# open http://0.0.0.0:8080/out/overview.html
# open http://0.0.0.0:8080/out/trait.html?trait=REAL_TRAIT_NAME_HERE
Clone this wiki locally