You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, I am impressed with the quality of the repositories and tutorials. I was able to generate the data (both to replicate the paper and using the new 393kbp sequences) and I could train the micro, mini and full models. It is rare and thus very impressive to find open source repositories that work out of the box. Thank you for you hard work and effort in making replication easy.
I am confused regarding which train/test folds were used to produce the models in the paper. The paper states:
We trained four models, each with distinct held out test and validation folds. (Page 14, Methods/Data)
However, the repo readme, supported by the answer to issue #11 (Clarification of model fold data splits) says:
We trained a total of 4 model replicates with identical train, validation and test splits (test = fold3, validation = fold4 from sequences_human.bed.gz).
This appears to be contradictory, unless you trained 4 models per fold set and only released the models for (test = fold3, validation = fold4). If this is the case, which models did you use in the results reported in the paper?
I downloaded the four models (links from the readme, e.g. https://storage.googleapis.com/seqnn-share/borzoi/f0/model0_best.h5). I tested on the K652 RNA-seq tracks (ENCSR000AEL, plus and minus), processed with the Makefile for 524kbp sequences from the borzoi-paper repo.
The image below shows my results from testing each model on each fold. I measure a Pearson correlation above 0.83 on each of the folds, except 3 and 4, where the scores are around 0.6/0.7.
This would indicate that the repo is correct in that the models were all trained on the same train/test split.
Please will you help me to understand which folds were used?
The text was updated successfully, but these errors were encountered:
Hello Borzoi Team
Firstly, I am impressed with the quality of the repositories and tutorials. I was able to generate the data (both to replicate the paper and using the new 393kbp sequences) and I could train the micro, mini and full models. It is rare and thus very impressive to find open source repositories that work out of the box. Thank you for you hard work and effort in making replication easy.
I am confused regarding which train/test folds were used to produce the models in the paper. The paper states:
However, the repo readme, supported by the answer to issue #11 (Clarification of model fold data splits) says:
This appears to be contradictory, unless you trained 4 models per fold set and only released the models for (test = fold3, validation = fold4). If this is the case, which models did you use in the results reported in the paper?
I downloaded the four models (links from the readme, e.g. https://storage.googleapis.com/seqnn-share/borzoi/f0/model0_best.h5). I tested on the K652 RNA-seq tracks (ENCSR000AEL, plus and minus), processed with the Makefile for 524kbp sequences from the borzoi-paper repo.
The image below shows my results from testing each model on each fold. I measure a Pearson correlation above 0.83 on each of the folds, except 3 and 4, where the scores are around 0.6/0.7.
This would indicate that the repo is correct in that the models were all trained on the same train/test split.
Please will you help me to understand which folds were used?
The text was updated successfully, but these errors were encountered: