enh: add instructions to run fmriprep on *Curnagl*

TheAxonLab · Oct 30, 2024 · cca4689 · cca4689
1 parent f291293
commit cca4689
Show file tree

Hide file tree

Showing 5 changed files with 152 additions and 49 deletions.
diff --git a/code/fmriprep/fmriprep-anatonly.sbatch b/code/fmriprep/fmriprep-anatonly.sbatch
@@ -0,0 +1,50 @@
+# Copyright 2024 The Axon Lab <[email protected]> 
+# 
+# Licensed under the Apache License, Version 2.0 (the "License"); 
+# you may not use this file except in compliance with the License. 
+# You may obtain a copy of the License at 
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0 
+# 
+# Unless required by applicable law or agreed to in writing, software 
+# distributed under the License is distributed on an "AS IS" BASIS, 
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
+# See the License for the specific language governing permissions and 
+# limitations under the License. 
+###########################################################################
+#
+# General SLURM settings
+#SBATCH --account {{ secrets.curnagl.account | default('<someaccount>')}} 
+#SBATCH --mail-type ALL 
+#SBATCH --mail-user <email>@unil.ch
+#
+# Job settings
+#SBATCH --job-name fmriprep
+#SBATCH --partition cpu
+#SBATCH --cpus-per-task 10
+#SBATCH --mem 10G 
+#SBATCH --time 24:00:00 
+#SBATCH --export NONE
+#SBATCH --chdir /scratch/oesteban
+#
+# Logging
+#SBATCH --output /users/%u/logs/%x-%A-%a.out
+#SBATCH --error /users/%u/logs/%x-%A-%a.err
+
+ml singularityce gcc
+
+mkdir -p {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/derivatives
+mkdir -p /scratch/oesteban/fmriprep
+
+export SINGULARITYENV_FS_LICENSE=$HOME/.freesurfer.txt
+singularity exec --cleanenv \
+        -B {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/hcph-dataset:/data/datasets/hcph/ \
+        -B {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/derivatives/:/out \
+        -B {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/hcph-fmriprep/:/derivatives \
+        -B /scratch/oesteban/fmriprep:/tmp \
+        docker://nipreps/fmriprep:24.1.1 \
+        fmriprep /data/datasets/hcph/ /out/fmriprep-24.1.1 participant \
+        --participant-label 001 \
+        --bids-database-dir /data/datasets/hcph/.bids-index/ \
+        --nprocs 4 --omp-nthreads ${SLURM_CPUS_PER_TASK} \
+        -w /tmp/ -vv --skip-bids-validation --anat-only
diff --git a/docs/data-management/preliminary.md b/docs/data-management/preliminary.md
@@ -55,7 +55,7 @@ When employing high-performance computing (HPC), we provide [some specific guide
     cd hcph-dataset
     datalad create-sibling-ria -s ria-storage --alias hcph-dataset \
             --new-store-ok --storage-sibling=only \
-            "ria+ssh://{{ secrets.login.curnagl_ria | default('<username>') }}@curnagl.dcsr.unil.ch:{{ secrets.data.curnagl_ria_store | default('<absolute-path-of-store>') }}"
+            "ria+ssh://{{ secrets.login.curnagl_ria | default('<username>') }}@curnagl.dcsr.unil.ch:{{ secrets.data.curnagl_ria_store_data | default('<absolute-path-of-store>') }}"
     ```
 
     ??? bug "Getting `[ERROR ] 'SSHRemoteIO' ...`"

diff --git a/docs/processing/environment.md b/docs/processing/environment.md
@@ -76,8 +76,8 @@ When HPC is planned for processing, *DataLad* will be required on that system(s)
 
         In such a scenario, create a *Conda* environment with a lower version of Python, and re-install datalad
         ``` shell
-        conda create -n "datalad" python=3.10
-        conda activate datalad
+        conda create -n "datamgt" python=3.10
+        conda activate datamgt
         conda install -c conda-forge datalad datalad-container
         ```
 
@@ -115,13 +115,13 @@ When HPC is planned for processing, *DataLad* will be required on that system(s)
         ```
     - [ ] Log out and back in
 
-- [ ] Create a new environment called `datalad` with *Git annex* in it:
+- [ ] Create a new environment called `datamgt` with *Git annex* in it:
     ```Bash
-    micromamba create -n datalad python=3.12 git-annex=*=alldep*
+    micromamba create -n datamgt python=3.12 git-annex=*=alldep*
     ```
 - [ ] Activate the environment
     ```Bash
-    micromamba activate datalad
+    micromamba activate datamgt
     ```
 - [ ] Install *DataLad* and *DataLad-next*:
     ```Bash
@@ -135,7 +135,9 @@ When HPC is planned for processing, *DataLad* will be required on that system(s)
     git config --global --add user.email [email protected]
     ```
 
-## Installing the *DataLad* dataset
+## Getting data
+
+### Installing the original HCPh dataset with *DataLad*
 
 Wherever you want to process the data, you'll need to `datalad install` it before you can pull down (`datalad get`) the data.
 To access the metadata (e.g., sidecar JSON files of the BIDS structure), you'll need to have access to the git repository that corresponds to the data (https://github.com/{{ secrets.data.gh_repo | default('&lt;organization&gt;/&lt;repo_name&gt;') }}.git)
@@ -149,56 +151,78 @@ To fetch the dataset from the RIA store, you will need your SSH key be added to
     - [ ] Send the SSH **public** key you just generated (e.g., `~/.ssh/id_ed25519.pub`) over email to Oscar at {{ secrets.email.oscar | default('*****@******') }}.
 
 
-- [ ] Install and get the dataset normally:
+- [ ] Install the dataset:
 
-    === "Installing the dataset without fetching data from annex"
+    ``` shell
+    micromamba run -n datamgt datalad install https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git
+    ```
 
-        ``` shell
-        datalad install https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git
-        ```
+- [ ] Reconfigure the RIA store:
 
-    === "Installing the dataset and fetch all data from annex, with 8 parallel threads"
+    ``` shell
+    micromamba run -n datamgt \
+        git annex initremote --private --sameas=ria-storage \
+        curnagl-storage type=external externaltype=ora encryption=none \
+        url="ria+file://{{ secrets.data.curnagl_ria_store_data | default('<path>') }}"
+    ```
 
-        ``` shell
-        datalad install -g -J 8 https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git
-        ```
+    !!! danger "REQUIRED step"
 
-!!! warning "Reconfiguring the RIA store on *Curnagl*"
+        When on *Curnagl*, you'll need to *convert* the `ria-storage` remote
+        on a local `ria-store` because you cannot ssh from *Curnagl* into itself.
 
-    When on *Curnagl*, you'll need to *convert* the `ria-storage` remote
-    on a local `ria-store` because you cannot ssh from *Curnagl* into itself:
+- [ ] Get the dataset:
 
-    ```Bash
-    git annex initremote --private --sameas=ria-storage curnagl-storage type=external externaltype=ora encryption=none url="ria+file://{{ secrets.data.curnagl_ria_store | default('<path>') }}"
-    ```
+    !!! danger "Data MUST be fetched from a development node."
 
-In addition to reconfiguring the RIA store, we should execute `datalad get` within a compute node:
+        The NAS is not accessible from compute nodes in *Curnagl*.
 
-- [ ] Create a *sbatch* job prescription script called `datalad-get.sbatch`:
-    ```Bash
-    #!/bin/bash -l
+        - [ ] Execute `datalad get` within a development node:
 
-    #SBATCH --account {{ secrets.data.curnagl_account | default('<PI>_<project_id>') }}
+            ``` Bash
+            salloc --partition=interactive --time=02:00:00 --cpus-per-task 12
+            ```
 
-    #SBATCH --chdir {{ secrets.data.curnagl_workdir | default('<workdir>') }}/data/hcph-dataset
-    #SBATCH --job-name datalad_get
-    #SBATCH --partition cpu
-    #SBATCH --cpus-per-task 12
-    #SBATCH --mem 10G
-    #SBATCH --time 05:00:00
-    #SBATCH --export NONE
+            Success is demonstrated by an output like:
 
-    #SBATCH --mail-type ALL
-    #SBATCH --mail-user <your-email-address>
-    #SBATCH --output /users/%u/logs/%x-%A-%a.out
-    #SBATCH --error /users/%u/logs/%x-%A-%a.err
+            ```Text
+            salloc: Granted job allocation 47734642
+            salloc: Nodes dna064 are ready for job
+            Switching to the 20240303 software stack
+            ```
 
+        - [ ] Fetch the data:
 
-    micromamba run -n fmriprep datalad get -J${SLURM_CPUS_PER_TASK} .
+            ```Bash
+            cd $WORK/data
+            micromamba run -n datamgt datalad get -J${SLURM_CPUS_PER_TASK} .
+            ```
+
+### Installing derivatives
+
+Derivatives are installed in a similar way:
+
+- [ ] Install the dataset:
+
+    ``` shell
+    micromamba run -n datamgt datalad install https://github.com/{{ secrets.data.gh_deriv_fmriprep | default('<organization>/<repo_name>') }}.git
     ```
-- [ ] Submit the job:
+
+- [ ] Reconfigure the RIA store:
+
+    ``` shell
+    micromamba run -n datamgt \
+        git annex initremote --private --sameas=ria-storage \
+        curnagl-storage type=external externaltype=ora encryption=none \
+        url="ria+file://{{ secrets.data.curnagl_ria_store_fmriprep | default('<path>') }}"
+    ```
+
+- [ ] Fetch the data
+
     ```Bash
-    sbatch datalad-get.sbatch
+    salloc --partition=interactive --time=02:00:00 --cpus-per-task 12
+    cd $WORK/data
+    micromamba run -n datamgt datalad get -J${SLURM_CPUS_PER_TASK} .
     ```
 
 ## Registering containers

diff --git a/docs/processing/preprocessing.md b/docs/processing/preprocessing.md
@@ -1,14 +1,42 @@
-## Executing *fMRIPrep*
+## Executing *fMRIPrep* (on *Curnagl*)
 
-Because *fMRIPrep* creates a single anatomical reference for all sessions, we generate such reference first by setting the `--anat-only` flag.
-If that *fMRIPrep* execution finishes successfully, the anatomical processing outcomes will be stored in the output folder.
-We will then run one *fMRIPrep* process for each dataset's session, which is the recommended way for datasets with a large number of sessions (e.g., more than six sessions).
-We avert that session-wise *fMRIPrep*'s processes run into race conditions by pre-computing the anatomical reference.
+### Preparations
+
+- [ ] Prepare a *FreeSurfer* license file, for example at `$HOME/.freesurfer.txt`:
+
+    ``` text
+{% filter indent(width=4) %}
+{{ secrets.licenses.freesurfer | default('<REDACTED:: Visit https://surfer.nmr.mgh.harvard.edu/fswiki/License for more information>') }}
+{% endfilter %}
+    ```
+
+- [ ] Ensure the dataset is up-to-date:
+    ``` bash
+    cd $WORK/data/hcph-dataset
+    micromamba run -n datamgt datalad update --how ff-only
+    ```
+- [ ] Checkout the correct tag corresponding to the intended processing:
+    ``` bash
+    micromamba run -n datamgt git checkout fmriprep-reliability-1.1
+    ```
+
+### Executing anatomical workflow first with `--anat-only`
+
+!!! warning "Compute nodes DO NOT have access to the NAS"
+
+    Therefore, make sure data have been installed and fetched onto the `{{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/hcph-dataset/` directory.
+
+- [ ] Create a SLURM *sbatch* file, for example at `$HOME/fmriprep-anatonly.sbatch`:
+
+    ``` bash
+{% filter indent(width=4) %}
+{% include 'code/fmriprep/fmriprep-anatonly.sbatch' %}
+{% endfilter %}
+    ```
 
 - [ ] Submit the anatomical workflow:
-    ``` bash title="Launch each session through fMRIPrep in parallel"
-    cd code/fmriprep
-    bash ss-fmriprep-anatonly.sh
+    ``` bash
+    sbatch fmriprep-anatonly.sbatch
     ```
 
     ??? abstract "The sbatch file to run *fMRIPrep* with `--anat-only`"

diff --git a/include/abbreviations.md b/include/abbreviations.md
@@ -40,6 +40,7 @@
 *[LR]: left-to-right
 *[MR]: Magnetic Resonance
 *[MRI]: Magnetic Resonance Imaging
+*[NAS]: Network-attached storage
 *[NIfTI]: Neuroimaging Informatics Technology Initiative
 *[OSF]: Open Science Framework (Center for Open Science)
 *[PA]: posterior-to-anterior