-
Notifications
You must be signed in to change notification settings - Fork 1
Tutorial
MrTomRod edited this page Dec 10, 2023
·
4 revisions
This tutorial shows how to generate Scoary-2-compatible orthogene data and how to run Scoary itself.
Prepare data and run OrthoFinder
# download protein FASTAs
wget https://cloud.bioinformatics.unibe.ch/index.php/s/ozbmq4AoDbWHWHA/download/scoary-2-tutorial.zip
unzip scoary-2-tutorial.zip && rm scoary-2-tutorial.zip
# run OrthoFinder (using docker? simply replace 'podman' with 'docker')
podman run -it -v .:/input:Z davidemms/orthofinder:2.5.4 \
orthofinder -f /input/fastas
# Get the OrthoFinder subdir name as a variable
ORTHOFINDER_RUN=$(basename fastas/OrthoFinder/Results_*) # e.g Results_Jan1
# create virtual environment
python3 -m venv ./venv
source venv/bin/activate # command to leave the virtual env: "deactivate"
pip install -U pip setuptools wheel
Optional: Create a gene-info file
# install orthofinder-tools
pip install orthofinder-tools
annotate_orthogroups \
--orthogroups_tsv fastas/OrthoFinder/$ORTHOFINDER_RUN/Phylogenetic_Hierarchical_Orthogroups/N0.tsv \
--fasta_dir fastas \
--file_endings faa \
--hog True \
--header True \
--out N0_best_names.tsv
Run Scoary2
Option 1: Via Python
# install Scoary2
pip install scoary-2
# run Scoary2
scoary2 \
--genes fastas/OrthoFinder/$ORTHOFINDER_RUN/Phylogenetic_Hierarchical_Orthogroups/N0.tsv \
--gene-data-type 'gene-list:\t' \
--gene-info N0_best_names.tsv \
--traits traits.tsv \
--trait-data-type 'binary:\t' \
--trait-info trait_info.tsv \
--isolate-info isolate_info.tsv \
--n-permut 200 \
--n-cpus 1 \
--outdir out \
--multiple_testing native:0.99 # force some output by setting a high p-value threshold
Option 2: Via podman/docker
# using docker? simply replace 'podman' with 'docker'
podman run --rm -v ./:/data:Z troder/scoary-2 \
scoary2 \
--genes /data/fastas/OrthoFinder/$ORTHOFINDER_RUN/Phylogenetic_Hierarchical_Orthogroups/N0.tsv \
--gene-data-type 'gene-list:\t' \
--gene-info /data/N0_best_names.tsv \
--traits /data/traits.tsv \
--trait-data-type 'binary:\t' \
--trait-info /data/trait_info.tsv \
--isolate-info /data/isolate_info.tsv \
--n-permut 200 \
--n-cpus 1 \
--outdir /data/out \
--multiple_testing native:0.99 # force some output by setting a high p-value threshold
Look at the results
Obviously, in this dummy dataset with only 4 isolates, no trait was significant, so there is nothing to look at.
Potential next steps:
- Inspect the log file (
out/logs/scoary-2.log
) - Inspect the trait overview file (
out/summary.tsv
) - Inspect specific traits manually (
out/traits/*
) - Inspect the output via the app:
# run a simple server
python -m http.server --cgi 8080
# open http://0.0.0.0:8080/out/overview.html
# open http://0.0.0.0:8080/out/trait.html?trait=REAL_TRAIT_NAME_HERE