-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more/better revindex functionality on top of LCA databases. #581
Comments
more thoughts, based on prodding from @bluegenes to think about renaming "LCA databases" :) I think the right medium term thing to do is:
but I don't want to delay 2.0 for this; I think it's a 3.0-ish kind of thing. |
Some more thoughts! If we make taxinfo databases include a taxonomic hierarchy as well (see |
Digging into the revindex code, it looks like the three main pieces of functionality we'd want in
we might also be interested in incorporating code from https://github.com/ctb/2017-sourmash-revindex/blob/master/classify-common-hashes.py to do some kind of full-database classification. |
An initial way of doing this would be storing (long term research project: go super fancy and implement a REINDEER index, but not today =]) |
#1013 adds protein, dayhoff, and hp signature indexing in LCA databases. |
#1015 adds abundance tracking into LCA databases. (CLOSED / NOT MERGED) |
it would be easy to support storing abundances in |
this idea lives on in mastiff, and also we've moved away from recommending LCA based approaches for taxonomy. So I'm closing. ref #2760 for arguments in favor of using gather+tax rather than LCA. |
With the merge of #533, the LCA databases (should) now contain the full set of hashes that SBTs do: previously, LCA DBs were buggy and contained a somewhat random collection of hashes that were connected with taxonomic IDs, but now they have everything whether or not it has a tax id. There's a lot of opportunity to connect back to the revindex code that @halexand and I have been using for various things, viz https://github.com/ctb/2017-sourmash-revindex, and make that code better/more complete/more usable/integrated into sourmash, either as an extension or directly in sourmash.
Just a thought :)
The text was updated successfully, but these errors were encountered: