Skip to content

SemI-SUpervised generative Autoencoder models for single cell data

License

Notifications You must be signed in to change notification settings

trungnt13/sisua

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SISUA

SISUA_design

Semi-supervised Single-cell modeling:

Reference:

  • Trung Ngo Trong, Roger Kramer, Juha Mehtonen, Gerardo González, Ville Hautamäki, Merja Heinäniemi. "SISUA: SemI-SUpervised Generative Autoencoder for Single Cell Data", ICML Workshop on Computational Biology, 2019. [pdf]

Installation

You only need Python 3.6, the stable version of SISUA installed via pip:

pip install sisua

Install the nightly version on github:

pip install git+https://github.com/trungnt13/sisua@master

For developers, we create a conda environment for SISUA contribution sisua_env

conda env create -f=sisua_env.yml

Getting started

  1. The basics:
  2. Single-cell analysis:
    • Latent space
    • Imputation of genes expression
    • Prediction of protein markers
  3. Advanced technical topics:
    • Probabilistic embedding
    • Hierarchical modeling (coming soon)
    • Causal analysis (coming soon)
    • Cross datasets analysis (coming soon)
  4. Benchmarks:

Roadmap

  1. [x] Multi-OMICs single-cell dataset (link)
  2. [x] Disentanglement VAE for multi-OMICs data (link)
  3. [x] New model: FactorVAE, BetaVAE, MIxture Semi-supervised Autoencoder (MISA) (link)
  4. [ ] Better imputation via hierarchical latents model.
  5. [ ] Release SISUA 2

Toolkits

We provide binary toolkits for fast and efficient analyzing single-cell datasets:

  • sisua-train: train single-cell modeling algorithms, support training multiple systems in parallel.
  • sisua-analyze: evaluate, compare, and interpret trained model.
  • sisua-embed: probabilistic embedding for semi-supervised training.
  • sisua-data: coming soon

Some important arguments:

-model

name of function declared in models

  • scvi: single-cell Variational Inference model
  • dca: Deep Count Autoencoder
  • vae: single-cell Variational Autoencoder
  • movae: SISUA
-ds

name of dataset declared in data.

Description of all predefined datasets is in docs.

Some good datasets for practicing:

  • pbmc8k_ly
  • cortex
  • pbmcecc_ly
  • pbmcscvi
  • pbmcscvae

Configuration

By default, the data will be saved at your home folder at ~/bio_data, and the experiments' outputs will be stored at ~/bio_log

You can customize these two paths using the environment variables:

  • For storing downloaded and preprocessed data: SISUA_DATA
  • For the experiments: SISUA_EXP

For example:

import os
os.environ['SISUA_DATA'] = '/tmp/bio_data'
os.environ['SISUA_EXP'] = '/tmp/bio_log'

from sisua.data import EXP_DIR, DATA_DIR

print(DATA_DIR) # /tmp/bio_data
print(EXP_DIR)  # /tmp/bio_log

or you could set the variables in advance:

export SISUA_DATA=/tmp/bio_data
export SISUA_EXP=/tmp/bio_log
python sisua/train.py
# or using the provided toolkit: sisua-train