Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, I'm Jaebeom Kim and developing a metagenomic classifier, Metabuli.
I'm using prodigal to predict genes of thousands of prokaryotic genomes,
so I'm working on making prodigal faster.
Here are some modifications.
Test environment: MacBook Pro, 1.4 GHz quad-core Intel Core i5, 16GB 2133 MHz LPDDR3
Rapid mode.
In the function named 'dprog', I found a suspicious part.
I think lines 52 and 53 are making the program slow. When i==999 and MAX_NODE_DIST==500, 'min' becomes 0 in line line 54, which leads to calling 'score_connection' 999 times. I'm not sure it is intended or not. So I just make an 'Rapid mode' to jump the lines.
When I tested with E.coli genome (GCF_000008865.2), Rapid mode decreased the running time (training + prediction) from ~9.9 sec to ~6.5 sec, while producing the same results.
MAX_NODE_DIST
While reviewing the source code, I found that reducing MAX_NODE_DIST in dprog.h can decrease running time. So, I decreased it from 500 to 300 and tested it using E.coli genome.
The running time (training + predicting) decreased from ~9.9 sec to ~7.2 sec, while producing the same result. But still, it may lead to prediction of lower quality in other cases, so I just made it as an option for users who want to get results faster.
When rapid mode is used with MAX_NODE_DIST 300,
the same predictions were produced in ~5.5 seconds, which is about 2X acceleration.
I tried to follow the code style :)
If you like the changes, please accept this PR and update the conda package as well.