What training and test folds were used for models in paper #33

BDEvan5 · 2024-12-06T10:26:28Z

Hello Borzoi Team

Firstly, I am impressed with the quality of the repositories and tutorials. I was able to generate the data (both to replicate the paper and using the new 393kbp sequences) and I could train the micro, mini and full models. It is rare and thus very impressive to find open source repositories that work out of the box. Thank you for you hard work and effort in making replication easy.

I am confused regarding which train/test folds were used to produce the models in the paper. The paper states:

We trained four models, each with distinct held out test and validation folds. (Page 14, Methods/Data)

However, the repo readme, supported by the answer to issue #11 (Clarification of model fold data splits) says:

We trained a total of 4 model replicates with identical train, validation and test splits (test = fold3, validation = fold4 from sequences_human.bed.gz).

This appears to be contradictory, unless you trained 4 models per fold set and only released the models for (test = fold3, validation = fold4). If this is the case, which models did you use in the results reported in the paper?

I downloaded the four models (links from the readme, e.g. https://storage.googleapis.com/seqnn-share/borzoi/f0/model0_best.h5). I tested on the K652 RNA-seq tracks (ENCSR000AEL, plus and minus), processed with the Makefile for 524kbp sequences from the borzoi-paper repo.

The image below shows my results from testing each model on each fold. I measure a Pearson correlation above 0.83 on each of the folds, except 3 and 4, where the scores are around 0.6/0.7.
This would indicate that the repo is correct in that the models were all trained on the same train/test split.

Please will you help me to understand which folds were used?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What training and test folds were used for models in paper #33

What training and test folds were used for models in paper #33

BDEvan5 commented Dec 6, 2024 •

edited

Loading

What training and test folds were used for models in paper #33

What training and test folds were used for models in paper #33

Comments

BDEvan5 commented Dec 6, 2024 • edited Loading

BDEvan5 commented Dec 6, 2024 •

edited

Loading