Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Olsson 2016 dataset for dimred task #352

Merged
merged 6 commits into from
Apr 26, 2022

Conversation

lazappi
Copy link
Collaborator

@lazappi lazappi commented Apr 11, 2022

Properly import the Olsson 2016 dataset for the dimensionality reduction task (as noted on comments on #316)

Submission type

  • This submission adds a new dataset
  • This submission adds a new method
  • This submission adds a new metric
  • This submission adds a new task
  • This submission adds a new Docker image
  • This submission fixes a bug (link to related issue: )
  • This submission adds a new feature not listed above

Testing

  • This submission was written on a forked copy of SingleCellOpenProblems
  • GitHub Actions "Run Benchmark" tests are passing on this base branch of this pull request (include link to passed test: )
  • If this pull request is not ready for review (including passing the "Run Benchmark" tests), I will open this PR as a draft (click on the down arrow next to the "Create Pull Request" button)

Submission guidelines

  • This submission follows the guidelines in our Contributing document
  • I have checked to ensure there aren't other open Pull Requests for the same update/change

PR review checklist

This PR will be evaluated on the basis of the following checks:

  • The task addresses a valid open problem in single-cell analysis
  • The latest version of master is merged and tested
  • The methods/metrics are imported to __init__.py and were tested in the pipeline
  • Method and metric decorators are annotated with paper title, year, author, code version, and date
  • The README gives an outline of the methods, metrics and datasets in the folder
  • The README provides a satisfactory task explanation (for new tasks)
  • The sample test data is appropriate to test implementation of all methods and metrics (for new tasks)

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2022

Codecov Report

Merging #352 (e87fd15) into main (2d57868) will increase coverage by 1.35%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #352      +/-   ##
==========================================
+ Coverage   90.69%   92.04%   +1.35%     
==========================================
  Files          83       83              
  Lines        1935     1937       +2     
  Branches      111      111              
==========================================
+ Hits         1755     1783      +28     
+ Misses        139      113      -26     
  Partials       41       41              
Flag Coverage Δ
unittests 92.04% <100.00%> (+1.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
openproblems/data/mouse_blood_olssen_labelled.py 100.00% <100.00%> (+100.00%) ⬆️
..._reduction/datasets/mouse_blood_olssen_labelled.py 100.00% <100.00%> (+100.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d57868...e87fd15. Read the comment docs.

* upstream/main:
  Fix benchmark commit (openproblems-bio#362)
  Remove scot unbalanced (openproblems-bio#360)
  store results in /tmp (openproblems-bio#361)
  fix gh actions badge link # ci skip (openproblems-bio#359)
  fix coverage badge # ci skip (openproblems-bio#358)
  Import SCOT (openproblems-bio#333)
  fix parsing and committing of results on tag (openproblems-bio#356)
@lazappi
Copy link
Collaborator Author

lazappi commented Apr 20, 2022

I think this is good to go now unless there is something I missed

@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 20, 2022

Could you add this dataset to the README? Then I think everything is ready to merge.

@lazappi
Copy link
Collaborator Author

lazappi commented Apr 21, 2022

Which README should it go in?

@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 21, 2022

This one: https://github.com/lazappi/openproblems/blob/dimred-datasets/openproblems/tasks/dimensionality_reduction/README.md

It would be good to have a "datasets" section in there as well, and the API section should mention what you expect a newly added dataset to have.

@lazappi
Copy link
Collaborator Author

lazappi commented Apr 21, 2022

Is there an example of how much detail you want? None of the READMEs in main describe the datasets.

@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 21, 2022

Check the batch integration PR. Just the API for datasets would be useful. And otherwise 1 sentence about the data would be nice too. We should add this in the main repo as well... thx for noticing!

* upstream/main:
  Run `test_benchmark` on a self-hosted runner (openproblems-bio#373)
  Jamboree label_projection task (openproblems-bio#313)
  Only cleanup AWS on success (openproblems-bio#371)
  Jamboree dimensionality reduction methods (openproblems-bio#318)
  Update benchmark results # ci skip (openproblems-bio#368)
  remove citeseq cbmc from DR (openproblems-bio#367)
  Ignore AWS warning and clean up s3 properly (openproblems-bio#366)
  docker images separate PR (openproblems-bio#354)
  Allow codecov to fail on forks
  remove scot altogether (openproblems-bio#363)
@lazappi
Copy link
Collaborator Author

lazappi commented Apr 26, 2022

I rewrote the README in the methods PR which is now merged. Is what is there now ok or should I still add something else?

@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 26, 2022

Saw it, yes. That looks good. At some point we'll need to go through all datasets and write a brief description of them in the task READMEs... but that can be for another PR. This looks good to me now! Thanks a lot @lazappi !

@LuckyMD LuckyMD merged commit 0964b11 into openproblems-bio:main Apr 26, 2022
@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 26, 2022

Woop woop, dimensionality reduction looks pretty nice now :). Congrats, @lazappi! 🎉

scottgigante-immunai added a commit that referenced this pull request May 2, 2022
* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
scottgigante-immunai added a commit that referenced this pull request May 2, 2022
* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

* Install libgeos-dev

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
scottgigante-immunai added a commit that referenced this pull request May 2, 2022
* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
scottgigante-immunai added a commit that referenced this pull request May 2, 2022
* label docker images

* fix syntax

* Delete run_benchmark.yml

* Update from main (#378)

* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* Install libgeos-dev (#377)

* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

* Install libgeos-dev

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* Install libgeos-dev

* Update test_docker (#379)

* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* clean up dockerfile

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
scottgigante-immunai added a commit that referenced this pull request May 10, 2022
* Fix rgeos install (#380)

* label docker images

* fix syntax

* Delete run_benchmark.yml

* Update from main (#378)

* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* Install libgeos-dev (#377)

* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

* Install libgeos-dev

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* Install libgeos-dev

* Update test_docker (#379)

* Label docker images based on build location (#351)

* label docker images

* fix syntax

* Run benchmark only after unittests (#349)

* run benchmark after unittests

* always run cleanup

* cleanup

* If using GH actions image, test for git diff on dockerfile (#350)

* if using gh actions image, test for git diff on dockerfile

* allow empty tag for now

* decode

* if image doesn't exist, automatically github actions

* fix quotes

* fix parsing and committing of results on tag (#356)

* Import SCOT (#333)

* import SCOT

* pre-commit

* scran requires R

* check that aligned spaces are finite

* exclude unbalanced SCOT for now

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* fix coverage badge # ci skip (#358)

* fix gh actions badge link # ci skip (#359)

* store results in /tmp (#361)

* Remove scot unbalanced (#360)

* Fix benchmark commit (#362)

* store results in /tmp

* add skip_on_empty

* class doesn't have skip on empty

* remove scot altogether (#363)

* Allow codecov to fail on forks

* docker images separate PR (#354)

* docker images separate PR

* all R requirements in r_requirements.txt

* move github r packages to requirements file

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Ignore AWS warning and clean up s3 properly (#366)

* ci cleanup

* ignore aws batch warning

* remove citeseq cbmc from DR (#367)

Co-authored-by: Scott Gigante <[email protected]>

* Update benchmark results # ci skip (#368)

Co-authored-by: SingleCellOpenProblems <[email protected]>

* Jamboree dimensionality reduction methods (#318)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Remove ivis

* pre-commit

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* Only cleanup AWS on success (#371)

* only cleanup on success

* pre-commit

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Jamboree label_projection task (#313)

* Add scvi-tools docker image

* add scanvi

* hvg command use 2000

* update scvi-tools version; use image

* train size

* scanvi mask test labels

* move import

* hvg on train only, fix hvg command

* add scarches scanvi

* use string labels in testing

* enforce batch metadata in dataset

* add batch metadata in pancreas random

* use train adata for scarches

* Add majority vote simple baseline

* test_mode

* use test instead of test mode, update contributing

* update contributing guide

* Added helper function to introduce label noise

* Actually return data with label noise

* Only introduce label noise on training data

* Made a pancreas dataset with label nosie

* Reformat docstring

* Added reference to example label noise dataset in datasets __init__.py

* Add cengen C elegans data loader (#2)

* add CeNGEN C elegans neuron dataset

* add CeNGEN C elegans dataset for global tasks and for label_projection task

* fix lines being too long

* Reformat cengen data loader

* Create tabula_muris_senis.py

Need dataframe containing sample information in './tabula_muris_senis_data_objects/tabula_muris_senis_data_objects.csv' 

load_tabula_muris_senis(method_list, organ_list) takes in methods and organs to extract data from and combines into one anndata object.
If method_list or organ_list = None, do not filter based on that input.
EX: load_tabula_muris_senis(method_list=['facs'], organ_list = None) returns all facs experiments for all organs in one anndata object.

* pre-commit

* Modify anndata in place in add_label_noise rather than copy

* Added CSV file with tabula muris senis data links

* Update tabula_muris_senis.py

* Add random_labels baseline to label_projection task

* Update tabula_muris_senis.py

* Update tabula_muris_senis.py

* pre-commit

* Update tabula_muris_senis.py

* pre-commit

* fix missing labels at prediction time

* Handle test flag through tests and docker, pass to methods

* If test method run, use 1 max_epoch for scvi-tools

* Use only 2 batches for sample dataset for label_projection

* Remove zebrafish random dataset

* Fix decorator dependency to <5.0.0

* Remove functools.wraps from docker decorator for test parameterization

* Fix cengen missing batch info

* Use functools.update_wrapper for docker test

* Add batch to pancreas_random_label_noise

* Make cengen test dataset have more cells per batch

* Set span=0.8 for hvg call for scanvi_hvg methods

* Set span=0.8 for HVG selection only in test mode for scvi

* Revert "Handle test flag through tests and docker, pass to methods"

This reverts commit 3b940c0.

* Add test parameter to label proj baselines

* Fix flake remove unused import

* Revert "Remove zebrafish random dataset"

This reverts commit 3915798.

* Update scVI setup_anndata to new version

* pre-commit

* Reformat and rerun tests

* Add code_url and code_version for baseline label proj methods

* Fallback HVG flavor for label projection task

* pre-commit

* Fix unused import

* Fix using highly_variable_genes

* Pin scvi-tools to 0.15.5

* Unpin scvi-tools, pin jax==0.3.6, see optuna/optuna-examples#99

* Add scikit-misc as requirement for scvi docker

* Pin jaxlib as well

* pin jaxlib along with jax

* Set paper_year to year of implementation

* Set random zebrafish split to 0.8+0.2

* Add tabula_muris_senis_lung_random dataset to label_projection

* pre-commit

* Add tabula muris senis datasets csv

* Fix loading tabula muris csv

* pre-commit

* Test loader for tabula muris senis

Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>

* Run `test_benchmark` on a self-hosted runner (#373)

* set up cirun

* use ubuntu standard AMI

* run nextflow on the self-hosted machine

* add to CONTRIBUTING

* update ami

* install unzip

* set up docker

* install docker from curl

* use t2.micro not nano

* use custom AMI

* pythonLocation

* add scripts to path

* larger disk size

* new image again

* chown for now

* chmod 755

* fixed permissions

* use tower workspace

* test nextflow

* try again

* nextflow -q

* redirect stderr

* increase memory

* cleanup

* sudo install

* name

* try setting pythonpath

* fix branch env

* another fix

* fix run name

* typo

* fix pythonpath:

* don't use pushd

* pass pythonpath

* set nousersite

* empty

* sudo install

* run attempt

* revert temporary changes

* cleanup

* fix contributing

* add instructions for tower

* fix repo name

* move ami setup into script

* Import Olsson 2016 dataset for dimred task (#352)

* Import Olsson 2016 dataset for dimred task

* Fix path to Olsson dataset loader

* Filter genes cells before subsetting Olsson data in test

* Use highly expressed genes for test Olsson dataset

Test dataset is now 700 genes by 300 cells (was 500 x 500)

* Add ivis dimred method (#369)

* add densMAP package to python-extras

* pre-commit

* Add Ivis method

* Explicitly mention it's CPU implementation

* Add forgotten import in __init__

* Remove redundant filtering

* Move ivis inside the function

* Make var names unique, add ivis[cpu] to README

* Pin tensorflow version

* Add NeuralEE skeleton

* Implement method

* added densmap and densne

* Fix typo pytoch -> torch

* pre-commit

* remove densne

* Add forgotten detach/cpu/numpy

* formatting

* pre-commit

* formatting

* formatting

* pre-commit

* formatting

* formatting

* formatting

* pre-commit

* formatting

* umap-learn implementation

* pre-commit

* Add docker image

* Add skeleton method

* formatting

* Implement method

* Fix some small bugs

* Add preprocessing

* Change batch size to 1k cells for aff. matrix

* Add new preprocessing

* Add new preprocessing

* Fix preprocessing

* Fix preprocessing

* pre-commit

* updated template for PR with PR evaluation checks (#314)

* Update alra.py (#304)

* Update alra.py

Fix pre-processing and transformation back into the original space

* pre-commit

* Update alra.py


* make sure necessary methods are imported


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burkhardt <[email protected]>

* Add scanpy preprocessing to densmap dimred method

* Rename preprocess_scanpy() to preprocess_logCPM_1kHVG()

* Add preprocessing suffix to dimred methods

* Subset object in preprocess_logCPM_1kHVG()

* Use standard names for input

* Add neuralee_logCPM_1kHVG method

* Add densmap_pca method

* Fix preprocess_logCPM_1kHVG()

Now returns an AnnData rather than acting in place
- Subsetting wasn't working in place

Also set HVG flavor to "cell_ranger"

* Add test argument to dimred methods

* Move preprocess_logCPM_1kHVG() to tools.normalize

* Change name in python-method-scvis Docker README

* Rename openproblems-python-method-scvis container

Now called open-problems-python36

* Fix AnnData ref in merge

* Copy object when subsetting in preprocess_logCPM_1kHVG()

* Move PCA to dimred methods

* Use preprocess_logCPM_1kHVG() in nn_ranking metrics

* Fix path in python36 dockerfile

* Add test kwarg to neuralee_default method

* Add check for n_var to preprocess_logCPM_1kHVG()

Should fix tests that were failing due to scverse/scanpy#2230

* Store raw counts in NeuralEE default method

* Update dimred README

* Replace X_input with PCA in ivis dimred method

* Refactor preprocess_logCPM_1kHVG() to log_cpm_hvg()

* Re-add ivis

Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Scott Gigante <[email protected]>

* hotfix timeout-minutes (#374)

* use branch of scprep to provide R traceback (#376)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* clean up dockerfile

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>

* only skip CI if command is in commit headline (#381)

* only skip if ci skip is in commit headline

* try using endsWith instead # ci skip

* Fix CI skip (#382)

* only skip if ci skip is in commit headline

* try using endsWith instead # ci skip

* make actions run

* upgrade AMI (#384)

* upgrade AMI

* uncomment docker

* uncomment tests

* Revert "Run test_benchmark on a self-hosted runner (#373)" (#386)

* revert 2d57868

* bash -x

* /bin/bash

* Bugfix CI (#387)

* upgrade AMI

* uncomment docker

* uncomment tests

* clean up testing

* tighter diff for testing

* more memory

* Revert "Bugfix CI (#387)" (#388)

This reverts commit b50a909.

* pass test arg to methods through CLI (#390)

* make scvi run faster on test mode (#385)

* make scvi run faster on test mode

* pass test argument through cli

* dirty hack to fix docker_build (#391)

* remove ivis temporarily (#392)

* neuralee fix (#383)

* build images before testing

* try something different

* needs

* fewer linebreaks

* try as string

* move the if

* remove one condition

* fix

* cancel more quickly

* run benchmark

* don't build on main in run_benchmark

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Scott Gigante <[email protected]>
Co-authored-by: Daniel Strobl <[email protected]>
Co-authored-by: SingleCellOpenProblems <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
Co-authored-by: Ben DeMeo <[email protected]>
Co-authored-by: Michal Klein <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: MalteDLuecken <[email protected]>
Co-authored-by: Wesley Lewis <[email protected]>
Co-authored-by: Daniel Burkhardt <[email protected]>
Co-authored-by: Nikolay Markov <[email protected]>
Co-authored-by: adamgayoso <[email protected]>
Co-authored-by: Valentine Svensson <[email protected]>
Co-authored-by: Eduardo Beltrame <[email protected]>
Co-authored-by: atchen <[email protected]>
rcannood pushed a commit that referenced this pull request Sep 4, 2024
Bumps [nf-core/setup-nextflow](https://github.com/nf-core/setup-nextflow) from 1.5.0 to 1.5.1.
- [Release notes](https://github.com/nf-core/setup-nextflow/releases)
- [Changelog](https://github.com/nf-core/setup-nextflow/blob/master/CHANGELOG.md)
- [Commits](nf-core/setup-nextflow@v1.5.0...v1.5.1)

---
updated-dependencies:
- dependency-name: nf-core/setup-nextflow
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
rcannood pushed a commit that referenced this pull request Sep 4, 2024
Bumps [nf-core/setup-nextflow](https://github.com/nf-core/setup-nextflow) from 1.5.0 to 1.5.1.
- [Release notes](https://github.com/nf-core/setup-nextflow/releases)
- [Changelog](https://github.com/nf-core/setup-nextflow/blob/master/CHANGELOG.md)
- [Commits](nf-core/setup-nextflow@v1.5.0...v1.5.1)

---
updated-dependencies:
- dependency-name: nf-core/setup-nextflow
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Former-commit-id: f1d9be5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants