Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a remove method to minhashes #573

Closed
luizirber opened this issue Dec 12, 2018 · 3 comments
Closed

Add a remove method to minhashes #573

luizirber opened this issue Dec 12, 2018 · 3 comments

Comments

@luizirber
Copy link
Member

luizirber commented Dec 12, 2018

During a gather run @taylorreiter and I noticed the build_new_signature taking a long time, and when I checked the implementation I saw it is creating a new minhash and adding all the hashes again.

https://github.com/dib-lab/sourmash/blob/4aab62f65fb08044e9a43ec49331b65be8b5ae15/sourmash/search.py#L135-L140

Another alternative is adding a remove method to the minhashes. This makes a lot of sense for gather because the query tends to be a big metagenomic minhash and we remove a small amount of hashes at each hit.

(and https://github.com/benfred/py-spy is great for finding these things! I didn't see this before because it didn't show up clearly on profiling runs, since other functions would dominate the runtime... But with py-spy you can take a snapshot while the program is running. Thanks @benfred!)

@luizirber
Copy link
Member Author

(also: move it outside gather_databases so it is (test|benchmark)-able)

@ctb
Copy link
Contributor

ctb commented Dec 13, 2018

relevant to #431

@ctb
Copy link
Contributor

ctb commented Jan 13, 2019

Closed by #615.

@ctb ctb closed this as completed Jan 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants