[WIP] Add file format prose description #267

ivirshup · 2019-12-06T05:42:06Z

This is WIP PR for adding new docs to AnnData. Currently it contains a first draft of a document meant to replace this one

Right now I'm thinking there should be three main documents:

The layout of the AnnData object (somewhat covered by it's doc-string)
How each element is represented on disk.
A technical description of the disk representation complete with expected attributes and noting differences between hdf5 and zarr.

DataFrame, DataFrame, dataframe, data-frame, data frame?
Add more technical docs going into specifics of schema
Convert from markdown to rst

LuckyMD · 2019-12-06T08:49:08Z

Yes! This sounds like what I've been annoying @falexwolf about more or less since I joined the group :). Another important part would be some more entry-level stuff like very small basic examples of slicing AnnData objects. Maybe also how to use groupby etc. We get a lot of questions/issues on that. This would be more than just "how to use pandas" and "how to use numpy" as it's about how AnnData objects react to different slicing (e.g. adata[slice,].X versus adata.X[slice,]`) and what it expects as identifiers. Especially things like dealing with boolean masks are not straightforward for users.

docs/fileformat-prose.rst

ivirshup · 2019-12-10T04:27:43Z

@LuckyMD I think some of what you're asking about is covered (though briefly) in the new AnnData doc-string. I'm thinking that maybe this should be expanded and moved to somewhere more discoverable in the docs.

For thinks like groupby, do you think some "cookbook" style documentation could help here? I was thinking some examples like this could be helpful:

Groupby for pairwise t-test

`groupby`

We can perform group-by like operations using the dataframes stored in an anndata object.
For example, here is some code which uses this to perform pairwise differential expression (via t-tests) on clusters in an anndata object.

from itertools import combinations
import pandas as pd
from scipy.stats import ttest_ind_from_stats
from sklearn.utils.sparsefuncs import mean_variance_axis


def ttest_pairwise(adata: AnnData, groupby) -> "Dict[Tuple[Any, Any], pd.DataFrame]":
    means, vars = {}, {}
    tests = {}
    grouped = adata.obs.groupby(groupby)

    for group, group_idx in grouped.groups.items():
        means[group], vars[group] = mean_variance_axis(adata[group_idx].X, axis=1)

    for (g1, g1_idx), (g2, g2_idx) in combinations(grouped.groups.items(), 2):
        t, pval = ttest_ind_from_stats(
            means[g1],
            np.sqrt(vars[g1]),
            len(g1_idx),
            means[g2],
            np.sqrt(vars[g2]),
            len(g2_idx),
            equal_var=False,
        )
        tests[(g1, g2)] = pd.DataFrame(
            [t, pval],
            columns=["t", "pval"],
            index=adata.var_names,
        )
    return tests

LuckyMD · 2019-12-10T10:22:58Z

I think "cookbook" style documentation is definitely the way forward for groupby. However, I wouldn't use functions like the one you show as examples, but instead use cases that are more common for beginner users. Stuff like finding the mean marker gene expression score for cell type X in all clusters. That would be a 2 liner or so. And then add a brief explanation that you apply .groupby() to .obs or .var to get grouped covariates by categories of a particular covariate. "group-by-like" might not be the best description of .groupby() ;).

I need to check out the new anndata documentation it seems.

docs/fileformat-prose.rst

* Added as a section to `docs/index.rst` so it gets rendered * Fixed some formatting * Commented out bash examples * Editting for brevity and clarity

flying-sheep · 2020-01-21T10:03:01Z

Hi @ivirshup I updated everything to intersphinx links and interpreter blocks (as opposed to regular code blocks). I also corrected the odd typo. LGTM

flying-sheep reviewed Dec 6, 2019

View reviewed changes

docs/fileformat-prose.rst Outdated Show resolved Hide resolved

flying-sheep reviewed Dec 6, 2019

View reviewed changes

docs/fileformat-prose.rst Outdated Show resolved Hide resolved

flying-sheep reviewed Dec 6, 2019

View reviewed changes

docs/fileformat-prose.rst Outdated Show resolved Hide resolved

flying-sheep reviewed Dec 6, 2019

View reviewed changes

docs/fileformat-prose.rst Outdated Show resolved Hide resolved

flying-sheep requested changes Dec 17, 2019

View reviewed changes

docs/fileformat-prose.rst Outdated Show resolved Hide resolved

docs/fileformat-prose.rst Outdated Show resolved Hide resolved

ivirshup mentioned this pull request Jan 13, 2020

categorial array not stored in .uns in v0.7rc1 #292

Closed

flying-sheep added this to the 0.7 final milestone Jan 20, 2020

ivirshup and others added 4 commits January 21, 2020 10:04

Add draft for file format prose description

59dbe51

rSTified

9f00a6f

Update prose description of on-disk representation

defd246

* Added as a section to `docs/index.rst` so it gets rendered * Fixed some formatting * Commented out bash examples * Editting for brevity and clarity

better links

ecdfe2d

flying-sheep force-pushed the 0.7-docs branch from 87707a2 to ecdfe2d Compare January 21, 2020 09:57

flying-sheep self-requested a review January 21, 2020 10:00

flying-sheep approved these changes Jan 21, 2020

View reviewed changes

flying-sheep changed the title ~~[WIP] Doc updates~~ [WIP] Add file format prose description Jan 21, 2020

flying-sheep merged commit e499f91 into scverse:master Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add file format prose description #267

[WIP] Add file format prose description #267

ivirshup commented Dec 6, 2019 •

edited by flying-sheep

Loading

LuckyMD commented Dec 6, 2019

ivirshup commented Dec 10, 2019 •

edited

Loading

`groupby`

LuckyMD commented Dec 10, 2019

flying-sheep commented Jan 21, 2020 •

edited

Loading

[WIP] Add file format prose description #267

[WIP] Add file format prose description #267

Conversation

ivirshup commented Dec 6, 2019 • edited by flying-sheep Loading

LuckyMD commented Dec 6, 2019

ivirshup commented Dec 10, 2019 • edited Loading

groupby

LuckyMD commented Dec 10, 2019

flying-sheep commented Jan 21, 2020 • edited Loading

ivirshup commented Dec 6, 2019 •

edited by flying-sheep

Loading

ivirshup commented Dec 10, 2019 •

edited

Loading

`groupby`

flying-sheep commented Jan 21, 2020 •

edited

Loading