Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support LCA/revindex database creation in sourmash index? #949

Closed
ctb opened this issue Apr 18, 2020 · 4 comments
Closed

support LCA/revindex database creation in sourmash index? #949

ctb opened this issue Apr 18, 2020 · 4 comments

Comments

@ctb
Copy link
Contributor

ctb commented Apr 18, 2020

Right now sourmash index only supports SBTs, but between #925 (new localized SBTs) and #946 (more Index-compliant LCA/revindex database interface) we could pretty easily update the code to support more generalized database types.

Could also be part of or include better/nicer generic index loading support, e.g. a single load_index function.

@ctb
Copy link
Contributor Author

ctb commented Apr 29, 2020

The UX here could rapidly become terrible. I think at the command line we should support "opinionated options" primarily, so that

sourmash index -t lowmem

produces an SBT, while

sourmash index -t smalldisk

produces a reverse index (or some such).

That way we avoid too many sourmash index -t sbt --option non-localized -s zip --blarg zorkle style command lines.

(Ooof, especially if we start adding lineage databases.)

@ctb
Copy link
Contributor Author

ctb commented Apr 16, 2022

The stuff in #1808 advances the internal code so that this kind of thing is easier and more obvious, but it's not clear to me that the LCA_Database code/approach is conducive to inclusion in index; we'd still have multiple completely different implementations of index underneath.

I do kind of like the idea of changing lca index to switch to using already-validated lineage databases a la sourmash tax, though, as that would consolidate code and tests in a nice way.

@ctb
Copy link
Contributor Author

ctb commented Apr 24, 2022

@mr-eyes points out in his review of #1808 that we now have three different types of indexed database in sourmash, and that there are at least three different ways of creating them!

Some options -

  • support -o .sbt.zip to create an SBT with default parameters.
  • switch construction of LCA index over to sourmash index (the original suggestion of this issue) with an optional -t/--taxonomy for taxonomy, as well as switching the default save format to SQL; could include implement .insert on LCA_SqliteDatabase and/or support on-disk/lowmem lca index #1964 in that.
  • deprecate lca index and JSON save/load format (although there may be no good reason to remove it completely? not sure, should measure in memory consumption of JSON db with sqlite :memory: storage)

Somewhere I think there is an issue about storing taxonomy/lineage in .zip files. Seems relevant if only tangentially so.

@ctb
Copy link
Contributor Author

ctb commented Sep 23, 2023

I think this is now a situation where plugins are a better option?

We have save and load plugins, that could be combined with new indexing approaches via CLI plugins. See https://github.com/sourmash-bio/pyo3_branchwater for an example.

@ctb ctb closed this as completed Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant