Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourmash gather is very slow on large, diverse metagenomes #300

Closed
ctb opened this issue Aug 22, 2017 · 4 comments
Closed

sourmash gather is very slow on large, diverse metagenomes #300

ctb opened this issue Aug 22, 2017 · 4 comments
Labels

Comments

@ctb
Copy link
Contributor

ctb commented Aug 22, 2017

Not entirely sure what's going on, but some thoughts --

  • top levels of SBT are probably over-full, should look at larger SBT tables;
  • maybe support diagnostic output (number of tables searched, etc.) in some output file;
  • we should mention the use of gather --scaled directly in the docs somewhere, if we don't already;
  • support progressive output (have gather output stuff progressively to disk #258) so cancelling/killing isn't so painful
  • support adaptive downsampling with gather (speed up sbt_gather by trial downsampling? #222)
  • maybe provide a smaller/lower-scaled SBT (--scaled 100000) for people to use

cc @ljcohen

@mytluo
Copy link

mytluo commented Aug 20, 2018

Hi @ctb,

Would it be possible to parallelize sourmash gather? That is, if there is an option to break up the output of compute into multiple .sig files, then run each .sig file in parallel for sourmash gather. Not sure if this is feasible in any way, we were curious if it would be.

@ctb
Copy link
Contributor Author

ctb commented Sep 5, 2018

hi @mytluo alas, it's not particularly amenable to parallelization. We have a few more ideas up our sleeves to make it faster, not sure if any of them will work.

@ctb
Copy link
Contributor Author

ctb commented Sep 5, 2018

See #519 for a list of our failures, along with one success :)

@ctb
Copy link
Contributor Author

ctb commented Jan 11, 2019

See #615 for some significant performance improvements; more coming.

@ctb ctb closed this as completed Jan 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants