Some information on the .h5ad
result file.
$ h5ls ./write/pmbc3k.h5ad
X Dataset {2638, 1838}
obs Dataset {2638}
obsm Dataset {2638}
raw.X Group
raw.var Dataset {13714}
uns Group
var Dataset {1838}
varm Dataset {1838}
Note that the sparse raw data is stored in a group:
$ h5ls write/pmbc3k.h5ad/raw.X
data Dataset {2238732/Inf}
indices Dataset {2238732/Inf}
indptr Dataset {2639/Inf}
Note that the annotation of observations and variables are stored as structured arrays:
$ h5ls -v ./write/pmbc3k.h5ad/obs
Opened "./write/pmbc3k.h5ad" with sec2 driver.
obs Dataset {2638/2638}
Location: 1:24557556
Links: 1
Chunks: {330} 10890 bytes
Storage: 87054 logical bytes, 42051 allocated bytes, 207.02% utilization
Filter-0: deflate-1 OPT {4}
Type: struct {
"index" +0 16-byte null-padded ASCII string
"n_genes" +16 native long
"percent_mito" +24 native float
"n_counts" +28 native float
"louvain" +32 native signed char
} 33 bytes
For categorical annotation, we store category labels (e.g., louvain_categories
) and colors (e.g., louvain_colors
) in the unstructured annotation. Also, we store parameters of and further unstructured annotation generated by each tools in a group named after the tool (louvain
, neighbors
, pca
, rank_gene_groups
):
$ h5ls write/pmbc3k.h5ad/uns
louvain Group
louvain_categories Dataset {8}
louvain_colors Dataset {8}
neighbors Group
pca Group
rank_genes_groups Group
The multi-dimensional annotation is treated in the same way.
$ h5ls -v ./write/pmbc3k.h5ad/obsm
Opened "./write/pmbc3k.h5ad" with sec2 driver.
obsm Dataset {2638/2638}
Location: 1:18063662
Links: 1
Chunks: {83} 19256 bytes
Storage: 612016 logical bytes, 575607 allocated bytes, 106.33% utilization
Filter-0: deflate-1 OPT {4}
Type: struct {
"X_pca" +0 [50] native float
"X_tsne" +200 [2] native double
"X_umap" +216 [2] native double
} 232 bytes
Here follows a summary of the whole content.
$ h5ls -r ./write/pmbc3k.h5ad
/ Group
/X Dataset {2638, 1838}
/obs Dataset {2638}
/obsm Dataset {2638}
/raw.X Group
/raw.X/data Dataset {2238732/Inf}
/raw.X/indices Dataset {2238732/Inf}
/raw.X/indptr Dataset {2639/Inf}
/raw.var Dataset {13714}
/uns Group
/uns/louvain Group
/uns/louvain/params Group
/uns/louvain/params/random_state Dataset {1}
/uns/louvain/params/resolution Dataset {1}
/uns/louvain_categories Dataset {8}
/uns/louvain_colors Dataset {8}
/uns/neighbors Group
/uns/neighbors/connectivities Group
/uns/neighbors/connectivities/data Dataset {42406/Inf}
/uns/neighbors/connectivities/indices Dataset {42406/Inf}
/uns/neighbors/connectivities/indptr Dataset {2639/Inf}
/uns/neighbors/distances Group
/uns/neighbors/distances/data Dataset {23742/Inf}
/uns/neighbors/distances/indices Dataset {23742/Inf}
/uns/neighbors/distances/indptr Dataset {2639/Inf}
/uns/neighbors/params Group
/uns/neighbors/params/method Dataset {1}
/uns/neighbors/params/n_neighbors Dataset {1}
/uns/pca Group
/uns/pca/variance Dataset {50}
/uns/pca/variance_ratio Dataset {50}
/uns/rank_genes_groups Group
/uns/rank_genes_groups/names Dataset {100}
/uns/rank_genes_groups/params Group
/uns/rank_genes_groups/params/groupby Dataset {1}
/uns/rank_genes_groups/params/method Dataset {1}
/uns/rank_genes_groups/params/reference Dataset {1}
/uns/rank_genes_groups/params/use_raw Dataset {1}
/uns/rank_genes_groups/scores Dataset {100}
/var Dataset {1838}
/varm Dataset {1838}
You might note that the neighborhood graph is stored in the unstructured annotation - however, anndata
will still slice and recognize it. In the long run, we might have another field that treats n_observations x n_observations
sparse matrices.