Skip to content
/ scMMAE Public
forked from Xubin-s-Lab/scMMAE

A cross-attention network based on masked autoencoder called single-cell multimodal masked autoencoder

Notifications You must be signed in to change notification settings

DM0815/scMMAE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scMMAE

A cross-attention network based on masked autoencoder called single-cell multimodal masked autoencoder Image text

Prerequisite

  • python 3.11.9
  • timm 1.0.7
  • pytorch 2.3.1
  • cudnn 8.9.2.26
  • scanpy 1.10.2
  • anndata 0.10.8
  • scikit-learn 1.5.1

The above packages are the main packages used for the experiment, most 2.0+ torch environments can run the experiment directly, just in case we have provided a ./requirements.txt file for all packages.

Getting started

If you want to use your own datasets in scMMAE, you should change six parameters:
config.RNA_tokens = config.RNA_component * config.emb_RNA, RNA_tokens represents the gene number you used (I used 4000 highly variable genes), and config.emb_RNA needs to be divisible by head numbers;
config.ADT_tokens = config.ADT_component * config.emb_ADT, ADT_tokens represents the protein number you used (I used all proteins), and config.emb_ADT needs to be divisible by head numbers.

Important

The input data is two matrix (RNA: cell_numbers*1*gene_numbers, PROTEIN:cell_numbers*1*protein_numbers). In addition, input data should be normalized before running the model.

Example

Use Anaconda to create a Python virtual environment. Here, we will create a Python 3.11 environment named scMMAE

conda create -n scMMAE python=3.11.9

Install packages

pip install -r requirements.txt

You can run ./scMMAE/code/stage1.py, and ./scMMAE/code/stage2.py directly as long as you unrar the dataset in the ./scMMAE/dataset/CITE-seq/*.rar ,and ./scMMAE/dataset/RNA-seq/*.rar.
Then you can run ./scMMAE/code/tutorial.ipynb to reproduce the results for IFNB scRNA-seq dataset and you should ideally comment out the training code at these stages. Of note, due to the large size of the dataset, we have uploaded a rar archive inside the dataset folder, which you will need to extract to the current directory.

Weighted Model

If you need pretrained and fine-tuned model for the dataset in the experiment, please contact [email protected]

About

A cross-attention network based on masked autoencoder called single-cell multimodal masked autoencoder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.3%
  • Python 16.7%