You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During a gather run @taylorreiter and I noticed the build_new_signature taking a long time, and when I checked the implementation I saw it is creating a new minhash and adding all the hashes again.
Another alternative is adding a remove method to the minhashes. This makes a lot of sense for gather because the query tends to be a big metagenomic minhash and we remove a small amount of hashes at each hit.
(and https://github.com/benfred/py-spy is great for finding these things! I didn't see this before because it didn't show up clearly on profiling runs, since other functions would dominate the runtime... But with py-spy you can take a snapshot while the program is running. Thanks @benfred!)
The text was updated successfully, but these errors were encountered:
During a gather run @taylorreiter and I noticed the
build_new_signature
taking a long time, and when I checked the implementation I saw it is creating a new minhash and adding all the hashes again.https://github.com/dib-lab/sourmash/blob/4aab62f65fb08044e9a43ec49331b65be8b5ae15/sourmash/search.py#L135-L140
Another alternative is adding a
remove
method to the minhashes. This makes a lot of sense for gather because the query tends to be a big metagenomic minhash and we remove a small amount of hashes at each hit.(and https://github.com/benfred/py-spy is great for finding these things! I didn't see this before because it didn't show up clearly on profiling runs, since other functions would dominate the runtime... But with py-spy you can take a snapshot while the program is running. Thanks @benfred!)
The text was updated successfully, but these errors were encountered: