You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am interested in some particular gene in my genome sequences, and unfortunately, Prodigal doesn't predict it. It predicts a shorter version that does not code for the right protein. When I look at the potential genes Prodigal considers, my gene is present (in bold), but the program chooses another gene (in bold and italic):
10099 10188 - -36.36 0.79 -37.15 TTG None None -4.77 -0.93 -31.45 0.544
10099 10248 - -15.28 -4.36 -10.93 GTG GGAG/GAGG 5-10bp 3.13 -10.05 -3.50 0.567
10099 10260 - -9.23 -4.61 -4.62 ATG None None -2.61 -4.28 2.77 0.562 10099 10335 - -9.63 -14.81 5.18 ATG GGA/GAG/AGG 5-10bp -1.73 3.41 4.00 0.549
Unfortunately, machine learning algorithms are never going to be perfect. The only way to guarantee a known gene gets found is through a database search.
Prodigal collects a variety of signals for each gene candidate. In your case, the wrong gene has better coding but a bad start site (GTG with no RBS), while the real gene has a terrible coding score but a much better start site (ATG with a 3 base RBS). So the short answer would be that Prodigal somehow has to get better at recognizing this sequence as coding. The fact its coding score is low means it uses unusual codons relative to the rest of the organism.
One thing I've thought about is an option to search candidates against a database when there is more than one reading frame (missing the start site is less big a deal than calling a gene in the wrong frame), but only if they are the best gene in their region along at least one axis (i.e. best coding score, or best start score).
Thanks for your rapid answer! Your remark on unusual codons is actually really insightful :)
This database option is not part of the current Prodigal right?
In short, should I forget Prodigal for a tool to predict this gene?
Hi,
I am interested in some particular gene in my genome sequences, and unfortunately, Prodigal doesn't predict it. It predicts a shorter version that does not code for the right protein. When I look at the potential genes Prodigal considers, my gene is present (in bold), but the program chooses another gene (in bold and italic):
10099 10188 - -36.36 0.79 -37.15 TTG None None -4.77 -0.93 -31.45 0.544
10099 10248 - -15.28 -4.36 -10.93 GTG GGAG/GAGG 5-10bp 3.13 -10.05 -3.50 0.567
10099 10260 - -9.23 -4.61 -4.62 ATG None None -2.61 -4.28 2.77 0.562
10099 10335 - -9.63 -14.81 5.18 ATG GGA/GAG/AGG 5-10bp -1.73 3.41 4.00 0.549
10185 10289 - -48.46 -17.73 -30.74 TTG None None -4.07 0.66 -26.82 0.571
10185 10301 - -45.41 -13.93 -31.48 TTG None None -3.64 -3.34 -24.00 0.573
10185 10319 - -12.03 -7.63 -4.40 GTG GGA/GAG/AGG 5-10bp -3.01 3.01 -3.90 0.548
10185 10382 - -18.21 10.49 -28.70 TTG None None -2.13 -12.54 -14.03 0.525
10185 10385 - 3.51 11.90 -8.39 GTG None None -2.10 -3.69 -2.60 0.532
How can I make sure my gene gets predicted?
Gene of interest:
Predicted gene:
The text was updated successfully, but these errors were encountered: