-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interface extensions #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good, a few comments/questions. Have this code been deployed in the test environment and tested against the latest qiita public release?
redbiom/commands/admin.py
Outdated
help="The filepath to the sample metadata to load.") | ||
def load_sample_metadata_search(metadata): | ||
"""Load sample metadata.""" | ||
# TODO: merge with load_sample_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? I'm not sure why. If so, perhaps adding an issue to clarify why is desirable will be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't really matter. Just felt a little conceptually off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then remove?
redbiom/commands/summarize.py
Outdated
This command will assess, per observation, the number of samples that | ||
observation is found in relative to the metadata category specified. | ||
""" | ||
if threads == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications of this and are we aware of a fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure yet. It could be my local system because I've used joblib with Session
a bunch in other contexts, including within redbiom. This is not an issue with concurrent processes (ie totally independent executions), but with some shared state between forked processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps retesting in the test environment will shine some light into the implications ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary. We need to support thread-local sessions. This does not appear to be an issue on the load into sun-14
likely from the order of operations (ie luck). I'm going to remove this support here for now until thread-local sessions can be added.
redbiom/commands/search.py
Outdated
help="The WHERE clause to apply") | ||
def search_metadata(restrict_to, where): | ||
"""Find samples by metadata""" | ||
# TODO: deprecate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why deprecate?
redbiom/commands/summarize.py
Outdated
@@ -74,17 +80,70 @@ def summarize_metadata(descending): | |||
click.echo("%s\t%s" % (idx, val)) | |||
|
|||
|
|||
def _summarize_id(context, category, id): | |||
"""Summarize the ID over the category""" | |||
from redbiom.summarize import category_from_observations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only place where we have: from redbiom.summarize import category_from_observations
vs. import redbiom.summarize
and then call redbiom.summarize.category_from_observations
, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, can change
import pandas as pd | ||
df = pd.DataFrame(mappings) | ||
df.set_index('feature', inplace=True) | ||
df[df.isnull()] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this table is returning counts, a zero is more appropriate than a nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, nan means that it wasn't there, 0 means no occurrences, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are integers, not floats, and zero occurrences are appropriate for similar reasons why zeros are appropriate in an OTU table
redbiom/metadata.py
Outdated
from redbiom.util import float_or_nan | ||
tokens = list(shlex.shlex(criteria)) | ||
if len(tokens) > 1: | ||
# < 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if these comments are needed ...
redbiom/metadata.py
Outdated
import shlex | ||
from redbiom.util import float_or_nan | ||
tokens = list(shlex.shlex(criteria)) | ||
if len(tokens) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the special case for == 1? shouldn't if operator is None:
cover that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code should be marked for deletion. it is no longer necessary with the improved search queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, why not just delete?
Thanks! This PR was exercised in the test environment a few commits back. A critical piece that does need to be exercised is the loading of the new indices. I don't think I'll be able to issue the loads this weekend unfortunately. |
Okay, pushed all the metadata searching onto the new grammars and fixed the thread handling for the table summarization. Tests should pass here. I'm now going to fix the loader for from qiita to bring it up to speed with the new metadata search indices |
|
|
import requests | ||
import os | ||
|
||
pid = os.getpid() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this is the thread/multiprocess work around. What we do is create a session per process ID. These are automatically closed out atexit
. This allows for each process to operate without mutating another's session state.
…l with 'redbiom search metadata --categories'
Depends on #15. Adds in support for searching over metadata from the command line to find samples. It's a bit janky as you need to restrict your categories to what is being used in the query in the general case of searching over all samples. This is because, otherwise, you'd need to fetch all the metadata for all samples and that is not a good solution.
As an example, it is now possible to do the following:
What's happening behind the scenes is that the metadata obtained are pushed into an
sqlite
database in order to handle the specific relational query. The backend for this is a copypasta of QIIME2'sMetadata
object.