Skip to content

Commit

Permalink
Merge pull request #4 from calico/revision-upd-2
Browse files Browse the repository at this point in the history
Revision update
  • Loading branch information
johli authored Oct 8, 2024
2 parents 7f76005 + 692613b commit b107b4f
Show file tree
Hide file tree
Showing 64 changed files with 4,228 additions and 97 deletions.
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
# Borzoi Model Evaluation & Analyses
This repository contains shell scripts, notebooks, commands, etc. related to the analyses performed in the [Borzoi manuscript](https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1). These analyses invoke functionality from both the [borzoi repository](https://github.com/calico/borzoi.git) and the [baskerville repository](https://github.com/calico/baskerville.git). Visit those links for general install instructions.
# Borzoi Model Training & Evaluation

This repository contains shell scripts, notebooks, commands, etc. related to the analyses performed in the [Borzoi paper](https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1), including data processing, model training, and evaluation. These analyses invoke functionality from the [borzoi](https://github.com/calico/borzoi.git), [baskerville](https://github.com/calico/baskerville.git), and [westminster](https://github.com/calico/westminster.git) repositories. Visit those links for general install instructions.

*Tip*: When executing .sh scripts found in this directory structure, we recommend first navigating in the terminal to the 'borzoi/examples' directory from the [borzoi repository](https://github.com/calico/borzoi), since all file paths are relative to this root directory.

For example, assuming *borzoi-paper* and *borzoi* are cloned to your home folder, issue commands of the form:
```sh
conda activate <my_conda_env>
cd ~/borzoi/examples
. ~/borzoi-paper/analysis/<some_folder>/<some_script>.sh
```

Contact *drk (at) @calicolabs.com* or *jlinder (at) @calicolabs.com* for questions.
30 changes: 30 additions & 0 deletions analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## Analyses

This directory contains model evaluation scripts and other downstream analyses.

*Notes*:
- Run the script 'setup_data.sh' to organize the multi-fold hg38 and mm10 data folders, which are required in order to run some evaluations. The hg38 and mm10 data must first be downloaded from the Borzoi training data bucket [here](https://storage.googleapis.com/borzoi-paper/data/) (GCP).
- Some scripts require the QTL data, which can be downloaded [here](https://storage.googleapis.com/borzoi-paper/qtl/) (GCP).
<br/>

As an example, to evaluate the model on gene-level test set predictions, issue the following commands:
```sh
conda activate borzoi_py310
cd ~/borzoi/examples
. ~/borzoi-paper/analysis/setup_data.sh
. ~/borzoi-paper/analysis/test_expression/testg.sh
```

As another example, to evaluate the model on sQTL variant effect predictions, issue these commands:
```sh
conda activate borzoi_py310
cd ~/borzoi/examples
. ~/borzoi-paper/analysis/sqtl/bench_sqtl.sh
```

The examples assume that you have
- installed a conda environment named 'borzoi_py310',
- cloned the 'borzoi' and 'borzoi-paper' repositories to your home folder,
- downloaded the borzoi training data to '~/borzoi/examples/data',
- downloaded the QTL data to '~/borzoi/examples/data/qtl_cat',
- and configured the borzoi repository ([instructions](https://github.com/calico/borzoi?tab=readme-ov-file#installation)).
2 changes: 1 addition & 1 deletion analysis/crispr/flowfish/run_gradients_flowfish.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/sh

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_k562_undo_clip -f 0,1,2,3 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/borzoi_v2/targets_k562.txt /home/jlinder/borzoi_v2/params_pred.json /home/jlinder/borzoi_v2 /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o saved_models/flowfish_k562 -f 3 -c 0,1,2,3 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t targets_k562.txt params_pred.json saved_models flowfish/crispr_genes.gtf
18 changes: 9 additions & 9 deletions analysis/crispr/flowfish/run_gradients_flowfish_miborzoi_ablations.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
#!/bin/sh

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_k562_all_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/k562_all/targets_k562_subset.txt /home/jlinder/mini_borzois_v2/k562_all/params_pred.json /home/jlinder/mini_borzois_v2/k562_all /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_k562_all -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/k562_all/targets_k562_subset.txt mini_borzois_v2/k562_all/params_pred.json mini_borzois_v2/k562_all flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_k562_dnase_atac_rna_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/k562_dnase_atac_rna/targets_k562_dnase_atac_rna_subset.txt /home/jlinder/mini_borzois_v2/k562_dnase_atac_rna/params_pred.json /home/jlinder/mini_borzois_v2/k562_dnase_atac_rna /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_k562_dnase_atac_rna -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/k562_dnase_atac_rna/targets_k562_dnase_atac_rna_subset.txt mini_borzois_v2/k562_dnase_atac_rna/params_pred.json mini_borzois_v2/k562_dnase_atac_rna flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_k562_rna_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/k562_rna/targets_k562_rna_subset.txt /home/jlinder/mini_borzois_v2/k562_rna/params_pred.json /home/jlinder/mini_borzois_v2/k562_rna /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_k562_rna -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/k562_rna/targets_k562_rna_subset.txt mini_borzois_v2/k562_rna/params_pred.json mini_borzois_v2/k562_rna flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_baseline_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/baseline/targets_subset.txt /home/jlinder/mini_borzois_v2/baseline/params_pred.json /home/jlinder/mini_borzois_v2/baseline /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_baseline -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/baseline/targets_subset.txt mini_borzois_v2/baseline/params_pred.json mini_borzois_v2/baseline flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_human_all_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/human_all/targets_subset.txt /home/jlinder/mini_borzois_v2/human_all/params_pred.json /home/jlinder/mini_borzois_v2/human_all /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_human_all -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/human_all/targets_subset.txt mini_borzois_v2/human_all/params_pred.json mini_borzois_v2/human_all flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_human_dnase_atac_rna_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/human_dnase_atac_rna/targets_human_dnase_atac_rna_subset.txt /home/jlinder/mini_borzois_v2/human_dnase_atac_rna/params_pred.json /home/jlinder/mini_borzois_v2/human_dnase_atac_rna /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_human_dnase_atac_rna -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/human_dnase_atac_rna/targets_human_dnase_atac_rna_subset.txt mini_borzois_v2/human_dnase_atac_rna/params_pred.json mini_borzois_v2/human_dnase_atac_rna flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_multisp_dnase_atac_rna_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/multispecies_dnase_atac_rna/targets_human_dnase_atac_rna_subset.txt /home/jlinder/mini_borzois_v2/multispecies_dnase_atac_rna/params_pred.json /home/jlinder/mini_borzois_v2/multispecies_dnase_atac_rna /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_multisp_dnase_atac_rna -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/multispecies_dnase_atac_rna/targets_human_dnase_atac_rna_subset.txt mini_borzois_v2/multispecies_dnase_atac_rna/params_pred.json mini_borzois_v2/multispecies_dnase_atac_rna flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_multisp_rna_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/multispecies_rna/targets_human_rna_subset.txt /home/jlinder/mini_borzois_v2/multispecies_rna/params_pred.json /home/jlinder/mini_borzois_v2/multispecies_rna /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_multisp_rna -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/multispecies_rna/targets_human_rna_subset.txt mini_borzois_v2/multispecies_rna/params_pred.json mini_borzois_v2/multispecies_rna flowfish/crispr_genes.gtf

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu.py -o flowfish_miborzoi_multisp_no_unet_undo_clip -f 0,1 --rc 1 --shifts 0 --span 0 --smoothgrad 0 --clip_soft 384.0 -t /home/jlinder/mini_borzois_v2/multispecies_no_unet/targets_subset.txt /home/jlinder/mini_borzois_v2/multispecies_no_unet/params_pred.json /home/jlinder/mini_borzois_v2/multispecies_no_unet /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene.py -o mini_borzois_v2/flowfish_miborzoi_multisp_no_unet -f 0,1 -c 0 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 -t mini_borzois_v2/multispecies_no_unet/targets_subset.txt mini_borzois_v2/multispecies_no_unet/params_pred.json mini_borzois_v2/multispecies_no_unet flowfish/crispr_genes.gtf
2 changes: 1 addition & 1 deletion analysis/crispr/flowfish/run_ism_shuffle_flowfish.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/sh

python /home/jlinder/basenji/bin/borzoi_satg_gene_gpu_crispr_ism_shuffle.py -o flowfish_k562_ism_shuffle_undo_clip -f 0,1,2,3 --rc 1 --shifts 0 --span 0 --clip_soft 384.0 --aggregate_tracks 10 --ism_size 1 --window_size 2048 --n_samples 16 --mononuc_shuffle 0 --dinuc_shuffle 1 --crispr_file /home/jlinder/flowfish/crispr_table.tsv -t /home/jlinder/borzoi_v2/targets_k562.txt /home/jlinder/borzoi_v2/params_pred.json /home/jlinder/borzoi_v2 /home/jlinder/flowfish/crispr_genes.gtf
borzoi_satg_gene_crispr_ism_shuffle.py -o saved_models/flowfish_k562_ism_shuffle -f 3 -c 0,1,2,3 --rc --untransform_old --track_scale 0.3 --track_transform 0.75 --clip_soft 384.0 --aggregate_tracks 10 --ism_size 1 --window_size 2048 --n_samples 16 --dinuc_shuffle --crispr_file flowfish/crispr_table.tsv -t targets_k562.txt params_pred.json saved_models flowfish/crispr_genes.gtf
Loading

0 comments on commit b107b4f

Please sign in to comment.