Reproduce the Enformer's input sequences split #190

sararb · 2024-03-05T18:23:28Z

I would like to regenerate the input sequences for Enformer/Basenji2 (using basenji_data.py), and for this purpose, I am using the following command line:

python basenji_data.py -g hg38.gaps.bed -u umap_k36_t10_l32_hg38.bed -b hg38.blacklist.rep.bed -l 131072 -crop_bp 8192 -break_t 786432 -s 65599 -t .1 -v .1 -w 128 -o data/input_mseqs -p 8 targets.txt

However, I am observing differences when compared to the sequences.bed file stored here

Can you please confirm if I am using the right options to generate the same sequence split?

The text was updated successfully, but these errors were encountered:

davek44 · 2024-03-09T01:15:29Z

Hi Sara, can you say a little more about your goal? It'll influence how I can best help. It'd be a little tricky for me to track down the exact parameters and basenji_data.py has changed over the years. Is it OK if the recipe is equivalent in quality, but different due to minor tweaks and random number seeds?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the Enformer's input sequences split #190

Reproduce the Enformer's input sequences split #190

sararb commented Mar 5, 2024

davek44 commented Mar 9, 2024

Reproduce the Enformer's input sequences split #190

Reproduce the Enformer's input sequences split #190

Comments

sararb commented Mar 5, 2024

davek44 commented Mar 9, 2024