Interface extensions #17

wasade · 2017-04-10T03:15:37Z

Depends on #15. Adds in support for searching over metadata from the command line to find samples. It's a bit janky as you need to restrict your categories to what is being used in the query in the general case of searching over all samples. This is because, otherwise, you'd need to fetch all the metadata for all samples and that is not a good solution.

As an example, it is now possible to do the following:

$ redbiom search metadata --restrict-to AGE_YEARS --where "CAST(AGE_YEARS AS FLOAT) > 40" | redbiom fetch samples --context test --output metadata_search_test.biom

What's happening behind the scenes is that the metadata obtained are pushed into an sqlite database in order to handle the specific relational query. The backend for this is a copypasta of QIIME2's Metadata object.

antgonza

looks pretty good, a few comments/questions. Have this code been deployed in the test environment and tested against the latest qiita public release?

antgonza · 2017-04-29T16:57:59Z

redbiom/commands/admin.py

+              help="The filepath to the sample metadata to load.")
+def load_sample_metadata_search(metadata):
+    """Load sample metadata."""
+    # TODO: merge with load_sample_metadata


Is this really necessary? I'm not sure why. If so, perhaps adding an issue to clarify why is desirable will be good.

it doesn't really matter. Just felt a little conceptually off

Then remove?

antgonza · 2017-04-29T16:59:28Z

redbiom/commands/summarize.py

+    This command will assess, per observation, the number of samples that
+    observation is found in relative to the metadata category specified.
+    """
+    if threads == 1:


What are the implications of this and are we aware of a fix?

I'm not sure yet. It could be my local system because I've used joblib with Session a bunch in other contexts, including within redbiom. This is not an issue with concurrent processes (ie totally independent executions), but with some shared state between forked processes.

perhaps retesting in the test environment will shine some light into the implications ...

Not necessary. We need to support thread-local sessions. This does not appear to be an issue on the load into sun-14 likely from the order of operations (ie luck). I'm going to remove this support here for now until thread-local sessions can be added.

antgonza · 2017-04-29T17:00:07Z

redbiom/commands/search.py

+              help="The WHERE clause to apply")
+def search_metadata(restrict_to, where):
+    """Find samples by metadata"""
+    # TODO: deprecate


why deprecate?

antgonza · 2017-04-29T17:01:47Z

redbiom/commands/summarize.py

@@ -74,17 +80,70 @@ def summarize_metadata(descending):
        click.echo("%s\t%s" % (idx, val))


+def _summarize_id(context, category, id):
+    """Summarize the ID over the category"""
+    from redbiom.summarize import category_from_observations


I think this is the only place where we have: from redbiom.summarize import category_from_observations vs. import redbiom.summarize and then call redbiom.summarize.category_from_observations, why?

good call, can change

antgonza · 2017-04-29T17:02:39Z

redbiom/commands/summarize.py

+    import pandas as pd
+    df = pd.DataFrame(mappings)
+    df.set_index('feature', inplace=True)
+    df[df.isnull()] = 0


why do we need this?

since this table is returning counts, a zero is more appropriate than a nan

not sure, nan means that it wasn't there, 0 means no occurrences, right?

These are integers, not floats, and zero occurrences are appropriate for similar reasons why zeros are appropriate in an OTU table

antgonza · 2017-04-29T17:07:56Z

redbiom/metadata.py

+    from redbiom.util import float_or_nan
+    tokens = list(shlex.shlex(criteria))
+    if len(tokens) > 1:
+        # < 5


not sure if these comments are needed ...

antgonza · 2017-04-29T17:09:33Z

redbiom/metadata.py

+    import shlex
+    from redbiom.util import float_or_nan
+    tokens = list(shlex.shlex(criteria))
+    if len(tokens) > 1:


why do we need the special case for == 1? shouldn't if operator is None: cover that case?

this code should be marked for deletion. it is no longer necessary with the improved search queries

Then, why not just delete?

wasade · 2017-04-29T17:31:03Z

Thanks! This PR was exercised in the test environment a few commits back. A critical piece that does need to be exercised is the loading of the new indices. I don't think I'll be able to issue the loads this weekend unfortunately.

wasade · 2017-05-01T23:33:09Z

Okay, pushed all the metadata searching onto the new grammars and fixed the thread handling for the table summarization. Tests should pass here. I'm now going to fix the loader for from qiita to bring it up to speed with the new metadata search indices

wasade · 2017-05-03T17:07:04Z

redbiom fetch samples is currently handing redbiom IDs but is not handling QIIME compatible IDs yet. That needs to be fixed, do not merge until that commit is in (this morning).

wasade · 2017-05-03T17:49:04Z

redbiom select samples-from-metadata was not fully disambiguating IDs leading to a discrepency with redbiom summarize samples.

wasade · 2017-05-03T22:01:54Z

redbiom/_requests.py

+    import requests
+    import os
+
+    pid = os.getpid()


note: this is the thread/multiprocess work around. What we do is create a session per process ID. These are automatically closed out atexit. This allows for each process to operate without mutating another's session state.

…l with 'redbiom search metadata --categories'

wasade added 30 commits January 26, 2017 14:47

DOC: for requests.py

0a57c33

ENH: Context support

0cf42ec

Decompose tests

55bb876

BLD: switch to nose

cd17d38

DOC: notes on design decisions

fc5a5d9

DOC: more notes in the readme

34f97ef

TST: additional tests for the requests module

4af7857

TST: more tests for requests

1dcc3a1

STY: flake8

401ef43

Addressing @antgonza's comments

3432bbd

ENH: infer stdin

96b0554

Restore POSIX compliance

190cd73

MAINT: refactor of admin commands

0295a7c

MAINT: refactor of fetch commands

4ffdcaa

DOC: additional help text for search

94d0217

MAINT: refactor of summarize

74e16de

Merge branch 'master' of github.com:wasade/redbiom into decouple

0f141e4

travis kick

6fb3a42

STY: flake8

e882069

left out a module

b5649f1

ENH: load support for tags

d451c06

Addressing @josenavas's comments

1f78c6c

Merge branch 'decouple' into to_tags

35881b7

WIP: Tag support

5be638e

Merge branch 'master' of github.com:wasade/redbiom into to_tags

f487f29

missing file

3dcad2d

STY: flake8

97940f0

Order stability

deeb6c9

what should be py2 compatibility

cf7f918

Missed one file

ae57017

wasade added 3 commits April 28, 2017 16:58

Had the wrong category in the query...

2a8f065

None is valid for AST in py27, not _really_ a danger...

a6dbda0

Test had order dependence

a070111

antgonza reviewed Apr 29, 2017

View reviewed changes

wasade added 3 commits May 1, 2017 15:21

Address @antgonza's comments, shift old search to new search

25ee32b

Fall back fully on new metadata search

def7fda

Fix thread local session handling

d254394

wasade added 7 commits May 1, 2017 16:50

Vim'd

6ed93b3

Omit negative numbers when tokenizing/stemming

9845a49

Disregard time like metadata values for full text search

c3f4d86

PY2/PY3 way of getting around unicode

19169b6

One session per pid...

ada8faa

only fetch stopwords once per dataframe

8ec9e1a

Fix other stemmer calls

86fdd5b

wasade added 3 commits May 3, 2017 14:53

Ambiguity bug, rids as output for samples-from-metadata

2b36399

Fix ambig bug

2e93e59

Actually put the samples-from-metadata correction

ddff706

wasade commented May 3, 2017

View reviewed changes

This was referenced May 3, 2017

Selection from summarize may actually be better expressed as select #11

Closed

Centralize and expose nullable set #19

Closed

wasade added 5 commits May 3, 2017 15:18

A lack of context on select doesn't make sense

2499849

flake8d

9840b62

Help text for redbiom search metadata

949ba49

Allow for summarizing a defined set of metadata categories, pairs wel…

a5f3155

…l with 'redbiom search metadata --categories'

flake8d again

77f870b

antgonza merged commit 37db432 into master May 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface extensions #17

Interface extensions #17

wasade commented Apr 10, 2017

antgonza left a comment

antgonza Apr 29, 2017

wasade Apr 29, 2017

antgonza Apr 29, 2017

antgonza Apr 29, 2017

wasade Apr 29, 2017

antgonza Apr 29, 2017

wasade May 1, 2017

antgonza Apr 29, 2017

antgonza Apr 29, 2017

wasade Apr 29, 2017

antgonza Apr 29, 2017

wasade Apr 29, 2017

antgonza Apr 29, 2017

wasade May 1, 2017

antgonza Apr 29, 2017

antgonza Apr 29, 2017

wasade Apr 29, 2017

antgonza Apr 29, 2017

wasade commented Apr 29, 2017

wasade commented May 1, 2017

wasade commented May 3, 2017

wasade commented May 3, 2017

wasade May 3, 2017

Interface extensions #17

Interface extensions #17

Conversation

wasade commented Apr 10, 2017

antgonza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wasade commented Apr 29, 2017

wasade commented May 1, 2017

wasade commented May 3, 2017

wasade commented May 3, 2017

Choose a reason for hiding this comment