From cca4689465e857c42beb61bed6a567968621280c Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Wed, 30 Oct 2024 20:51:26 +0100 Subject: [PATCH] enh: add instructions to run fmriprep on *Curnagl* --- code/fmriprep/fmriprep-anatonly.sbatch | 50 ++++++++++++ docs/data-management/preliminary.md | 2 +- docs/processing/environment.md | 104 +++++++++++++++---------- docs/processing/preprocessing.md | 44 +++++++++-- include/abbreviations.md | 1 + 5 files changed, 152 insertions(+), 49 deletions(-) create mode 100644 code/fmriprep/fmriprep-anatonly.sbatch diff --git a/code/fmriprep/fmriprep-anatonly.sbatch b/code/fmriprep/fmriprep-anatonly.sbatch new file mode 100644 index 00000000..5eacf9cf --- /dev/null +++ b/code/fmriprep/fmriprep-anatonly.sbatch @@ -0,0 +1,50 @@ +# Copyright 2024 The Axon Lab +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +########################################################################### +# +# General SLURM settings +#SBATCH --account {{ secrets.curnagl.account | default('')}} +#SBATCH --mail-type ALL +#SBATCH --mail-user @unil.ch +# +# Job settings +#SBATCH --job-name fmriprep +#SBATCH --partition cpu +#SBATCH --cpus-per-task 10 +#SBATCH --mem 10G +#SBATCH --time 24:00:00 +#SBATCH --export NONE +#SBATCH --chdir /scratch/oesteban +# +# Logging +#SBATCH --output /users/%u/logs/%x-%A-%a.out +#SBATCH --error /users/%u/logs/%x-%A-%a.err + +ml singularityce gcc + +mkdir -p {{ secrets.data.curnagl_workdir | default('')}}/data/derivatives +mkdir -p /scratch/oesteban/fmriprep + +export SINGULARITYENV_FS_LICENSE=$HOME/.freesurfer.txt +singularity exec --cleanenv \ + -B {{ secrets.data.curnagl_workdir | default('')}}/data/hcph-dataset:/data/datasets/hcph/ \ + -B {{ secrets.data.curnagl_workdir | default('')}}/data/derivatives/:/out \ + -B {{ secrets.data.curnagl_workdir | default('')}}/data/hcph-fmriprep/:/derivatives \ + -B /scratch/oesteban/fmriprep:/tmp \ + docker://nipreps/fmriprep:24.1.1 \ + fmriprep /data/datasets/hcph/ /out/fmriprep-24.1.1 participant \ + --participant-label 001 \ + --bids-database-dir /data/datasets/hcph/.bids-index/ \ + --nprocs 4 --omp-nthreads ${SLURM_CPUS_PER_TASK} \ + -w /tmp/ -vv --skip-bids-validation --anat-only diff --git a/docs/data-management/preliminary.md b/docs/data-management/preliminary.md index 0df0706a..17c74022 100644 --- a/docs/data-management/preliminary.md +++ b/docs/data-management/preliminary.md @@ -55,7 +55,7 @@ When employing high-performance computing (HPC), we provide [some specific guide cd hcph-dataset datalad create-sibling-ria -s ria-storage --alias hcph-dataset \ --new-store-ok --storage-sibling=only \ - "ria+ssh://{{ secrets.login.curnagl_ria | default('') }}@curnagl.dcsr.unil.ch:{{ secrets.data.curnagl_ria_store | default('') }}" + "ria+ssh://{{ secrets.login.curnagl_ria | default('') }}@curnagl.dcsr.unil.ch:{{ secrets.data.curnagl_ria_store_data | default('') }}" ``` ??? bug "Getting `[ERROR ] 'SSHRemoteIO' ...`" diff --git a/docs/processing/environment.md b/docs/processing/environment.md index 51718ff3..1bed2c8c 100644 --- a/docs/processing/environment.md +++ b/docs/processing/environment.md @@ -76,8 +76,8 @@ When HPC is planned for processing, *DataLad* will be required on that system(s) In such a scenario, create a *Conda* environment with a lower version of Python, and re-install datalad ``` shell - conda create -n "datalad" python=3.10 - conda activate datalad + conda create -n "datamgt" python=3.10 + conda activate datamgt conda install -c conda-forge datalad datalad-container ``` @@ -115,13 +115,13 @@ When HPC is planned for processing, *DataLad* will be required on that system(s) ``` - [ ] Log out and back in -- [ ] Create a new environment called `datalad` with *Git annex* in it: +- [ ] Create a new environment called `datamgt` with *Git annex* in it: ```Bash - micromamba create -n datalad python=3.12 git-annex=*=alldep* + micromamba create -n datamgt python=3.12 git-annex=*=alldep* ``` - [ ] Activate the environment ```Bash - micromamba activate datalad + micromamba activate datamgt ``` - [ ] Install *DataLad* and *DataLad-next*: ```Bash @@ -135,7 +135,9 @@ When HPC is planned for processing, *DataLad* will be required on that system(s) git config --global --add user.email doe@example.com ``` -## Installing the *DataLad* dataset +## Getting data + +### Installing the original HCPh dataset with *DataLad* Wherever you want to process the data, you'll need to `datalad install` it before you can pull down (`datalad get`) the data. To access the metadata (e.g., sidecar JSON files of the BIDS structure), you'll need to have access to the git repository that corresponds to the data (https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git) @@ -149,56 +151,78 @@ To fetch the dataset from the RIA store, you will need your SSH key be added to - [ ] Send the SSH **public** key you just generated (e.g., `~/.ssh/id_ed25519.pub`) over email to Oscar at {{ secrets.email.oscar | default('*****@******') }}. -- [ ] Install and get the dataset normally: +- [ ] Install the dataset: - === "Installing the dataset without fetching data from annex" + ``` shell + micromamba run -n datamgt datalad install https://github.com/{{ secrets.data.gh_repo | default('/') }}.git + ``` - ``` shell - datalad install https://github.com/{{ secrets.data.gh_repo | default('/') }}.git - ``` +- [ ] Reconfigure the RIA store: - === "Installing the dataset and fetch all data from annex, with 8 parallel threads" + ``` shell + micromamba run -n datamgt \ + git annex initremote --private --sameas=ria-storage \ + curnagl-storage type=external externaltype=ora encryption=none \ + url="ria+file://{{ secrets.data.curnagl_ria_store_data | default('') }}" + ``` - ``` shell - datalad install -g -J 8 https://github.com/{{ secrets.data.gh_repo | default('/') }}.git - ``` + !!! danger "REQUIRED step" -!!! warning "Reconfiguring the RIA store on *Curnagl*" + When on *Curnagl*, you'll need to *convert* the `ria-storage` remote + on a local `ria-store` because you cannot ssh from *Curnagl* into itself. - When on *Curnagl*, you'll need to *convert* the `ria-storage` remote - on a local `ria-store` because you cannot ssh from *Curnagl* into itself: +- [ ] Get the dataset: - ```Bash - git annex initremote --private --sameas=ria-storage curnagl-storage type=external externaltype=ora encryption=none url="ria+file://{{ secrets.data.curnagl_ria_store | default('') }}" - ``` + !!! danger "Data MUST be fetched from a development node." -In addition to reconfiguring the RIA store, we should execute `datalad get` within a compute node: + The NAS is not accessible from compute nodes in *Curnagl*. -- [ ] Create a *sbatch* job prescription script called `datalad-get.sbatch`: - ```Bash - #!/bin/bash -l + - [ ] Execute `datalad get` within a development node: - #SBATCH --account {{ secrets.data.curnagl_account | default('_') }} + ``` Bash + salloc --partition=interactive --time=02:00:00 --cpus-per-task 12 + ``` - #SBATCH --chdir {{ secrets.data.curnagl_workdir | default('') }}/data/hcph-dataset - #SBATCH --job-name datalad_get - #SBATCH --partition cpu - #SBATCH --cpus-per-task 12 - #SBATCH --mem 10G - #SBATCH --time 05:00:00 - #SBATCH --export NONE + Success is demonstrated by an output like: - #SBATCH --mail-type ALL - #SBATCH --mail-user - #SBATCH --output /users/%u/logs/%x-%A-%a.out - #SBATCH --error /users/%u/logs/%x-%A-%a.err + ```Text + salloc: Granted job allocation 47734642 + salloc: Nodes dna064 are ready for job + Switching to the 20240303 software stack + ``` + - [ ] Fetch the data: - micromamba run -n fmriprep datalad get -J${SLURM_CPUS_PER_TASK} . + ```Bash + cd $WORK/data + micromamba run -n datamgt datalad get -J${SLURM_CPUS_PER_TASK} . + ``` + +### Installing derivatives + +Derivatives are installed in a similar way: + +- [ ] Install the dataset: + + ``` shell + micromamba run -n datamgt datalad install https://github.com/{{ secrets.data.gh_deriv_fmriprep | default('/') }}.git ``` -- [ ] Submit the job: + +- [ ] Reconfigure the RIA store: + + ``` shell + micromamba run -n datamgt \ + git annex initremote --private --sameas=ria-storage \ + curnagl-storage type=external externaltype=ora encryption=none \ + url="ria+file://{{ secrets.data.curnagl_ria_store_fmriprep | default('') }}" + ``` + +- [ ] Fetch the data + ```Bash - sbatch datalad-get.sbatch + salloc --partition=interactive --time=02:00:00 --cpus-per-task 12 + cd $WORK/data + micromamba run -n datamgt datalad get -J${SLURM_CPUS_PER_TASK} . ``` ## Registering containers diff --git a/docs/processing/preprocessing.md b/docs/processing/preprocessing.md index 020f7fee..36fcda2b 100644 --- a/docs/processing/preprocessing.md +++ b/docs/processing/preprocessing.md @@ -1,14 +1,42 @@ -## Executing *fMRIPrep* +## Executing *fMRIPrep* (on *Curnagl*) -Because *fMRIPrep* creates a single anatomical reference for all sessions, we generate such reference first by setting the `--anat-only` flag. -If that *fMRIPrep* execution finishes successfully, the anatomical processing outcomes will be stored in the output folder. -We will then run one *fMRIPrep* process for each dataset's session, which is the recommended way for datasets with a large number of sessions (e.g., more than six sessions). -We avert that session-wise *fMRIPrep*'s processes run into race conditions by pre-computing the anatomical reference. +### Preparations + +- [ ] Prepare a *FreeSurfer* license file, for example at `$HOME/.freesurfer.txt`: + + ``` text +{% filter indent(width=4) %} +{{ secrets.licenses.freesurfer | default('') }} +{% endfilter %} + ``` + +- [ ] Ensure the dataset is up-to-date: + ``` bash + cd $WORK/data/hcph-dataset + micromamba run -n datamgt datalad update --how ff-only + ``` +- [ ] Checkout the correct tag corresponding to the intended processing: + ``` bash + micromamba run -n datamgt git checkout fmriprep-reliability-1.1 + ``` + +### Executing anatomical workflow first with `--anat-only` + +!!! warning "Compute nodes DO NOT have access to the NAS" + + Therefore, make sure data have been installed and fetched onto the `{{ secrets.data.curnagl_workdir | default('')}}/data/hcph-dataset/` directory. + +- [ ] Create a SLURM *sbatch* file, for example at `$HOME/fmriprep-anatonly.sbatch`: + + ``` bash +{% filter indent(width=4) %} +{% include 'code/fmriprep/fmriprep-anatonly.sbatch' %} +{% endfilter %} + ``` - [ ] Submit the anatomical workflow: - ``` bash title="Launch each session through fMRIPrep in parallel" - cd code/fmriprep - bash ss-fmriprep-anatonly.sh + ``` bash + sbatch fmriprep-anatonly.sbatch ``` ??? abstract "The sbatch file to run *fMRIPrep* with `--anat-only`" diff --git a/include/abbreviations.md b/include/abbreviations.md index 64856858..d1483610 100644 --- a/include/abbreviations.md +++ b/include/abbreviations.md @@ -40,6 +40,7 @@ *[LR]: left-to-right *[MR]: Magnetic Resonance *[MRI]: Magnetic Resonance Imaging +*[NAS]: Network-attached storage *[NIfTI]: Neuroimaging Informatics Technology Initiative *[OSF]: Open Science Framework (Center for Open Science) *[PA]: posterior-to-anterior