Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minced doesn't find a repeat if it is at the start of the fasta file #35

Open
Alan-Collins opened this issue Aug 18, 2022 · 1 comment

Comments

@Alan-Collins
Copy link

Hi,

It seems that MinCED is unable to identify repeats when they are right at the beginning of the fasta file. Adding any nucleotide before the first repeat fixes this.

Attached are two fasta files with the same array. Repeats are lowercase and spacers uppercase. example_array_plusA.txt has a single A added to the beginning of the file.

Using minced 0.4.2 installed from bioconda on the first file (example_array.txt), 7 repeats are found,

$ minced example_array.fna
Sequence 'array' (571 bp)

CRISPR 1   Range: 84 - 571
POSITION        REPEAT                          SPACER
--------        -------------------------------------   --------------------------------------
84              GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   TATGTGCCGTGACTTCGATGCTGAGTTCAAACAT      [ 37, 34 ]
155             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   TTACCCTGTCCGACGCTGACCTGTCCGGCGCTGATCTGTC        [ 37, 40 ]
232             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   CGCCGGCGAATCGTTCATGCTCACCCGCGCGGATT     [ 37, 35 ]
304             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   AAGCGTCTTTACGGGAGTCGTGGACGACCTGGTCCCGACC        [ 37, 40 ]
381             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   CTGACGTGATACCGACCGACATCCTCATGGCGATTCCC  [ 37, 38 ]
456             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   ACGCCGCAAAACAGTGCGCCTATAAAGACGATTTTCGTCCCG      [ 37, 42 ]
535             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC
--------        -------------------------------------   --------------------------------------
Repeats: 7      Average Length: 37              Average Length: 38

Time to find repeats: 2 ms

With a single nucleotide added before the first repeat, 8 repeats are found.

$ minced example_array_plusA.fna
Sequence 'array' (572 bp)

CRISPR 1   Range: 2 - 572
POSITION        REPEAT                          SPACER
--------        -------------------------------------   ---------------------------------------
2               GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   AAACAATACAAACTACATCTACTGTAACACTTTCACTTGATAGCAA  [ 37, 46 ]
85              GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   TATGTGCCGTGACTTCGATGCTGAGTTCAAACAT      [ 37, 34 ]
156             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   TTACCCTGTCCGACGCTGACCTGTCCGGCGCTGATCTGTC        [ 37, 40 ]
233             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   CGCCGGCGAATCGTTCATGCTCACCCGCGCGGATT     [ 37, 35 ]
305             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   AAGCGTCTTTACGGGAGTCGTGGACGACCTGGTCCCGACC        [ 37, 40 ]
382             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   CTGACGTGATACCGACCGACATCCTCATGGCGATTCCC  [ 37, 38 ]
457             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC   ACGCCGCAAAACAGTGCGCCTATAAAGACGATTTTCGTCCCG      [ 37, 42 ]
536             GTCTCAATCCCCCTTACTCAATCGGGTCTGTCTACAC
--------        -------------------------------------   ---------------------------------------
Repeats: 8      Average Length: 37              Average Length: 39

Time to find repeats: 2 ms

Thanks!
Alan

example_array.txt
example_array_plusA.txt

@ctSkennerton
Copy link
Owner

Thank you for the bug report, this is indeed a problem. Unfortunately I don't have time to look into this issue right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants