-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add some docs on search, gather, and lca methods #393
Conversation
Codecov Report
@@ Coverage Diff @@
## master #393 +/- ##
==========================================
+ Coverage 90.14% 90.15% +0.01%
==========================================
Files 31 31
Lines 4585 4593 +8
Branches 36 36
==========================================
+ Hits 4133 4141 +8
Misses 451 451
Partials 1 1
Continue to review full report at Codecov.
|
Ready for review & merge, @brooksph @taylorreiter @luizirber. While this is incomplete, it is a massive upgrade to the current documentation :). I'd like to suggest that review focus on accuracy, and that suggestions for the next round of improvement get added to a new issue. That way we can merge this sooner rather than later, while also keeping track of ideas for future improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is much more thorough. I think it reads well, but found a few sections that had small grammatical issues, or where difficult to understand.
doc/classifying-signatures.md
Outdated
|
||
We have implemented two algorithms in sourmash to do this. | ||
|
||
One, approaches based on lowest common ancestor ("LCA"), uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language here is a little confusing, but not that critical to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks for noticing!
|
||
By default, there is no structured taxonomic information available in | ||
sourmash signatures or SBT databases of signatures. Generally what | ||
this means is that you will have to provide your own mapping from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this refer to the output of gather? (Line 72-74)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it refers to all of the non-LCA stuff. This is just going to have to be confusing, given the addition of lca gather :).
`gather`, like `search`, will load all of provided signatures into | ||
memory. You can use `sourmash index` to create a Sequence Bloom Tree | ||
(SBT) that can be quickly searched on disk; this is | ||
[the same format in which we provide GenBank and other databases](databases.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 205-207 is repeated above. I see the utility of the information being in both places, but is it repeated verbatim intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
doc/command-line.md
Outdated
These commands use LCA databases (created with `index`, below, or | ||
prepared databases such as | ||
[genbank-k31.lca.json.gz, from the LCA tutorial](tutorials-lca.html). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought index
was to create the SBT for gather?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lca index
. Fixed!
doc/command-line.md
Outdated
|
||
### `sourmash lca gather` | ||
|
||
The `sourmash lca gather` command classifies finds all non-overlapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classifies finds -- too many words I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
@taylorreiter fixed in b10b490 - thx! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
yay thanks! ...do you think we should wait for #390 to merge? I'm -0 on waiting. |
(because this includes docs for |
I don't have strong feelings either way, or a strong concept of the number of people we will confuse by merging now as opposed to waiting. |
On Sun, Feb 18, 2018 at 10:20:29AM -0800, Taylor Reiter wrote:
I don't have strong feelings either way, or a strong concept of the number of people we will confuse by merging now as opposed to waiting.
yeah, me neither. Heck, I'll just merge it.
|
Update documentation significantly in preparation for 2.0 release -
Note, includes documentation for
lca gather
, introduced in #390. Incorporates PR #362 as well.make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?