Skip to content

Commit

Permalink
enh: add instructions to run fmriprep on *Curnagl*
Browse files Browse the repository at this point in the history
  • Loading branch information
oesteban committed Oct 30, 2024
1 parent f291293 commit cca4689
Show file tree
Hide file tree
Showing 5 changed files with 152 additions and 49 deletions.
50 changes: 50 additions & 0 deletions code/fmriprep/fmriprep-anatonly.sbatch
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright 2024 The Axon Lab <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
###########################################################################
#
# General SLURM settings
#SBATCH --account {{ secrets.curnagl.account | default('<someaccount>')}}
#SBATCH --mail-type ALL
#SBATCH --mail-user <email>@unil.ch
#
# Job settings
#SBATCH --job-name fmriprep
#SBATCH --partition cpu
#SBATCH --cpus-per-task 10
#SBATCH --mem 10G
#SBATCH --time 24:00:00
#SBATCH --export NONE
#SBATCH --chdir /scratch/oesteban
#
# Logging
#SBATCH --output /users/%u/logs/%x-%A-%a.out
#SBATCH --error /users/%u/logs/%x-%A-%a.err

ml singularityce gcc

mkdir -p {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/derivatives
mkdir -p /scratch/oesteban/fmriprep

export SINGULARITYENV_FS_LICENSE=$HOME/.freesurfer.txt
singularity exec --cleanenv \
-B {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/hcph-dataset:/data/datasets/hcph/ \
-B {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/derivatives/:/out \
-B {{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/hcph-fmriprep/:/derivatives \
-B /scratch/oesteban/fmriprep:/tmp \
docker://nipreps/fmriprep:24.1.1 \
fmriprep /data/datasets/hcph/ /out/fmriprep-24.1.1 participant \
--participant-label 001 \
--bids-database-dir /data/datasets/hcph/.bids-index/ \
--nprocs 4 --omp-nthreads ${SLURM_CPUS_PER_TASK} \
-w /tmp/ -vv --skip-bids-validation --anat-only
2 changes: 1 addition & 1 deletion docs/data-management/preliminary.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ When employing high-performance computing (HPC), we provide [some specific guide
cd hcph-dataset
datalad create-sibling-ria -s ria-storage --alias hcph-dataset \
--new-store-ok --storage-sibling=only \
"ria+ssh://{{ secrets.login.curnagl_ria | default('<username>') }}@curnagl.dcsr.unil.ch:{{ secrets.data.curnagl_ria_store | default('<absolute-path-of-store>') }}"
"ria+ssh://{{ secrets.login.curnagl_ria | default('<username>') }}@curnagl.dcsr.unil.ch:{{ secrets.data.curnagl_ria_store_data | default('<absolute-path-of-store>') }}"
```
??? bug "Getting `[ERROR ] 'SSHRemoteIO' ...`"
Expand Down
104 changes: 64 additions & 40 deletions docs/processing/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ When HPC is planned for processing, *DataLad* will be required on that system(s)

In such a scenario, create a *Conda* environment with a lower version of Python, and re-install datalad
``` shell
conda create -n "datalad" python=3.10
conda activate datalad
conda create -n "datamgt" python=3.10
conda activate datamgt
conda install -c conda-forge datalad datalad-container
```

Expand Down Expand Up @@ -115,13 +115,13 @@ When HPC is planned for processing, *DataLad* will be required on that system(s)
```
- [ ] Log out and back in
- [ ] Create a new environment called `datalad` with *Git annex* in it:
- [ ] Create a new environment called `datamgt` with *Git annex* in it:
```Bash
micromamba create -n datalad python=3.12 git-annex=*=alldep*
micromamba create -n datamgt python=3.12 git-annex=*=alldep*
```
- [ ] Activate the environment
```Bash
micromamba activate datalad
micromamba activate datamgt
```
- [ ] Install *DataLad* and *DataLad-next*:
```Bash
Expand All @@ -135,7 +135,9 @@ When HPC is planned for processing, *DataLad* will be required on that system(s)
git config --global --add user.email [email protected]
```
## Installing the *DataLad* dataset
## Getting data
### Installing the original HCPh dataset with *DataLad*
Wherever you want to process the data, you'll need to `datalad install` it before you can pull down (`datalad get`) the data.
To access the metadata (e.g., sidecar JSON files of the BIDS structure), you'll need to have access to the git repository that corresponds to the data (https://github.com/{{ secrets.data.gh_repo | default('&lt;organization&gt;/&lt;repo_name&gt;') }}.git)
Expand All @@ -149,56 +151,78 @@ To fetch the dataset from the RIA store, you will need your SSH key be added to
- [ ] Send the SSH **public** key you just generated (e.g., `~/.ssh/id_ed25519.pub`) over email to Oscar at {{ secrets.email.oscar | default('*****@******') }}.


- [ ] Install and get the dataset normally:
- [ ] Install the dataset:

=== "Installing the dataset without fetching data from annex"
``` shell
micromamba run -n datamgt datalad install https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git
```

``` shell
datalad install https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git
```
- [ ] Reconfigure the RIA store:

=== "Installing the dataset and fetch all data from annex, with 8 parallel threads"
``` shell
micromamba run -n datamgt \
git annex initremote --private --sameas=ria-storage \
curnagl-storage type=external externaltype=ora encryption=none \
url="ria+file://{{ secrets.data.curnagl_ria_store_data | default('<path>') }}"
```

``` shell
datalad install -g -J 8 https://github.com/{{ secrets.data.gh_repo | default('<organization>/<repo_name>') }}.git
```
!!! danger "REQUIRED step"

!!! warning "Reconfiguring the RIA store on *Curnagl*"
When on *Curnagl*, you'll need to *convert* the `ria-storage` remote
on a local `ria-store` because you cannot ssh from *Curnagl* into itself.
When on *Curnagl*, you'll need to *convert* the `ria-storage` remote
on a local `ria-store` because you cannot ssh from *Curnagl* into itself:
- [ ] Get the dataset:
```Bash
git annex initremote --private --sameas=ria-storage curnagl-storage type=external externaltype=ora encryption=none url="ria+file://{{ secrets.data.curnagl_ria_store | default('<path>') }}"
```
!!! danger "Data MUST be fetched from a development node."
In addition to reconfiguring the RIA store, we should execute `datalad get` within a compute node:
The NAS is not accessible from compute nodes in *Curnagl*.
- [ ] Create a *sbatch* job prescription script called `datalad-get.sbatch`:
```Bash
#!/bin/bash -l
- [ ] Execute `datalad get` within a development node:
#SBATCH --account {{ secrets.data.curnagl_account | default('<PI>_<project_id>') }}
``` Bash
salloc --partition=interactive --time=02:00:00 --cpus-per-task 12
```
#SBATCH --chdir {{ secrets.data.curnagl_workdir | default('<workdir>') }}/data/hcph-dataset
#SBATCH --job-name datalad_get
#SBATCH --partition cpu
#SBATCH --cpus-per-task 12
#SBATCH --mem 10G
#SBATCH --time 05:00:00
#SBATCH --export NONE
Success is demonstrated by an output like:
#SBATCH --mail-type ALL
#SBATCH --mail-user <your-email-address>
#SBATCH --output /users/%u/logs/%x-%A-%a.out
#SBATCH --error /users/%u/logs/%x-%A-%a.err
```Text
salloc: Granted job allocation 47734642
salloc: Nodes dna064 are ready for job
Switching to the 20240303 software stack
```
- [ ] Fetch the data:
micromamba run -n fmriprep datalad get -J${SLURM_CPUS_PER_TASK} .
```Bash
cd $WORK/data
micromamba run -n datamgt datalad get -J${SLURM_CPUS_PER_TASK} .
```
### Installing derivatives
Derivatives are installed in a similar way:
- [ ] Install the dataset:
``` shell
micromamba run -n datamgt datalad install https://github.com/{{ secrets.data.gh_deriv_fmriprep | default('<organization>/<repo_name>') }}.git
```
- [ ] Submit the job:
- [ ] Reconfigure the RIA store:
``` shell
micromamba run -n datamgt \
git annex initremote --private --sameas=ria-storage \
curnagl-storage type=external externaltype=ora encryption=none \
url="ria+file://{{ secrets.data.curnagl_ria_store_fmriprep | default('<path>') }}"
```
- [ ] Fetch the data
```Bash
sbatch datalad-get.sbatch
salloc --partition=interactive --time=02:00:00 --cpus-per-task 12
cd $WORK/data
micromamba run -n datamgt datalad get -J${SLURM_CPUS_PER_TASK} .
```
## Registering containers
Expand Down
44 changes: 36 additions & 8 deletions docs/processing/preprocessing.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,42 @@
## Executing *fMRIPrep*
## Executing *fMRIPrep* (on *Curnagl*)

Because *fMRIPrep* creates a single anatomical reference for all sessions, we generate such reference first by setting the `--anat-only` flag.
If that *fMRIPrep* execution finishes successfully, the anatomical processing outcomes will be stored in the output folder.
We will then run one *fMRIPrep* process for each dataset's session, which is the recommended way for datasets with a large number of sessions (e.g., more than six sessions).
We avert that session-wise *fMRIPrep*'s processes run into race conditions by pre-computing the anatomical reference.
### Preparations

- [ ] Prepare a *FreeSurfer* license file, for example at `$HOME/.freesurfer.txt`:

``` text
{% filter indent(width=4) %}
{{ secrets.licenses.freesurfer | default('<REDACTED:: Visit https://surfer.nmr.mgh.harvard.edu/fswiki/License for more information>') }}
{% endfilter %}
```
- [ ] Ensure the dataset is up-to-date:
``` bash
cd $WORK/data/hcph-dataset
micromamba run -n datamgt datalad update --how ff-only
```
- [ ] Checkout the correct tag corresponding to the intended processing:
``` bash
micromamba run -n datamgt git checkout fmriprep-reliability-1.1
```
### Executing anatomical workflow first with `--anat-only`
!!! warning "Compute nodes DO NOT have access to the NAS"
Therefore, make sure data have been installed and fetched onto the `{{ secrets.data.curnagl_workdir | default('<workdir>')}}/data/hcph-dataset/` directory.
- [ ] Create a SLURM *sbatch* file, for example at `$HOME/fmriprep-anatonly.sbatch`:
``` bash
{% filter indent(width=4) %}
{% include 'code/fmriprep/fmriprep-anatonly.sbatch' %}
{% endfilter %}
```
- [ ] Submit the anatomical workflow:
``` bash title="Launch each session through fMRIPrep in parallel"
cd code/fmriprep
bash ss-fmriprep-anatonly.sh
``` bash
sbatch fmriprep-anatonly.sbatch
```
??? abstract "The sbatch file to run *fMRIPrep* with `--anat-only`"
Expand Down
1 change: 1 addition & 0 deletions include/abbreviations.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
*[LR]: left-to-right
*[MR]: Magnetic Resonance
*[MRI]: Magnetic Resonance Imaging
*[NAS]: Network-attached storage
*[NIfTI]: Neuroimaging Informatics Technology Initiative
*[OSF]: Open Science Framework (Center for Open Science)
*[PA]: posterior-to-anterior
Expand Down

0 comments on commit cca4689

Please sign in to comment.