Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add file format prose description #267

Merged
merged 4 commits into from
Jan 21, 2020

Conversation

ivirshup
Copy link
Member

@ivirshup ivirshup commented Dec 6, 2019

@falexwolf @flying-sheep

This is WIP PR for adding new docs to AnnData. Currently it contains a first draft of a document meant to replace this one

Right now I'm thinking there should be three main documents:

  • The layout of the AnnData object (somewhat covered by it's doc-string)
  • How each element is represented on disk.
  • A technical description of the disk representation complete with expected attributes and noting differences between hdf5 and zarr.
  • DataFrame, DataFrame, dataframe, data-frame, data frame?
  • Add more technical docs going into specifics of schema
  • Convert from markdown to rst

@LuckyMD
Copy link

LuckyMD commented Dec 6, 2019

Yes! This sounds like what I've been annoying @falexwolf about more or less since I joined the group :). Another important part would be some more entry-level stuff like very small basic examples of slicing AnnData objects. Maybe also how to use groupby etc. We get a lot of questions/issues on that. This would be more than just "how to use pandas" and "how to use numpy" as it's about how AnnData objects react to different slicing (e.g. adata[slice,].X versus adata.X[slice,]`) and what it expects as identifiers. Especially things like dealing with boolean masks are not straightforward for users.

@ivirshup
Copy link
Member Author

ivirshup commented Dec 10, 2019

@LuckyMD I think some of what you're asking about is covered (though briefly) in the new AnnData doc-string. I'm thinking that maybe this should be expanded and moved to somewhere more discoverable in the docs.

For thinks like groupby, do you think some "cookbook" style documentation could help here? I was thinking some examples like this could be helpful:

Groupby for pairwise t-test

groupby

We can perform group-by like operations using the dataframes stored in an anndata object.
For example, here is some code which uses this to perform pairwise differential expression (via t-tests) on clusters in an anndata object.

from itertools import combinations
import pandas as pd
from scipy.stats import ttest_ind_from_stats
from sklearn.utils.sparsefuncs import mean_variance_axis


def ttest_pairwise(adata: AnnData, groupby) -> "Dict[Tuple[Any, Any], pd.DataFrame]":
    means, vars = {}, {}
    tests = {}
    grouped = adata.obs.groupby(groupby)

    for group, group_idx in grouped.groups.items():
        means[group], vars[group] = mean_variance_axis(adata[group_idx].X, axis=1)

    for (g1, g1_idx), (g2, g2_idx) in combinations(grouped.groups.items(), 2):
        t, pval = ttest_ind_from_stats(
            means[g1],
            np.sqrt(vars[g1]),
            len(g1_idx),
            means[g2],
            np.sqrt(vars[g2]),
            len(g2_idx),
            equal_var=False,
        )
        tests[(g1, g2)] = pd.DataFrame(
            [t, pval],
            columns=["t", "pval"],
            index=adata.var_names,
        )
    return tests

@LuckyMD
Copy link

LuckyMD commented Dec 10, 2019

I think "cookbook" style documentation is definitely the way forward for groupby. However, I wouldn't use functions like the one you show as examples, but instead use cases that are more common for beginner users. Stuff like finding the mean marker gene expression score for cell type X in all clusters. That would be a 2 liner or so. And then add a brief explanation that you apply .groupby() to .obs or .var to get grouped covariates by categories of a particular covariate. "group-by-like" might not be the best description of .groupby() ;).

I need to check out the new anndata documentation it seems.

docs/fileformat-prose.rst Outdated Show resolved Hide resolved
docs/fileformat-prose.rst Outdated Show resolved Hide resolved
ivirshup and others added 4 commits January 21, 2020 10:04
* Added as a section to `docs/index.rst` so it gets rendered
* Fixed some formatting
* Commented out bash examples
* Editting for brevity and clarity
@flying-sheep
Copy link
Member

flying-sheep commented Jan 21, 2020

Hi @ivirshup I updated everything to intersphinx links and interpreter blocks (as opposed to regular code blocks). I also corrected the odd typo. LGTM :shipit:

@flying-sheep flying-sheep changed the title [WIP] Doc updates [WIP] Add file format prose description Jan 21, 2020
@flying-sheep flying-sheep merged commit e499f91 into scverse:master Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants