[ENH] start bidsifying output (#80)

* bidsify output * rm tmp files * start fixing test * fix tests * post merge fix * update doc output * update tests * Update giga_connectome/postprocess.py * rm tmp * remove old data files it exists * Update giga_connectome/utils.py * make output to bids the default * make bids output the default * save correlation matrix to tsv * revert group level * one output file per atlas * isort * timeseries to tsv * fix test and output doc * Update .pre-commit-config.yaml * lint * fixes * move to contrib
bids-apps · Mar 25, 2024 · e303c9f · e303c9f
1 parent 59d8380
commit e303c9f
Show file tree

Hide file tree

Showing 11 changed files with 305 additions and 93 deletions.
diff --git a/docs/source/changes.md b/docs/source/changes.md
@@ -15,6 +15,7 @@ Released MONTH YEAR
 
 ### Changes
 
+- [ENH] Make output more BIDS compliant. (@Remi-Gau)
 - [MAINT] Pin dependencies for docker build for better reproducibility. (@Remi-Gau)
 
 ## 0.4.0

diff --git a/docs/source/contributing.md b/docs/source/contributing.md
@@ -115,6 +115,30 @@ This tells the development team that your pull request is a "work-in-progress",
 
 One your PR is ready a member of the development team will review your changes to confirm that they can be merged into the main codebase.
 
+### Running the demo
+
+You can run a demo of the bids app by downloading some test data.
+
+Run the following from the root of the repository.
+
+```bash
+pip install tox
+tox -e test_data
+```
+
+```bash
+giga_connectome \
+    --atlas Schaefer20187Networks \
+    --denoise-strategy simple \
+    --standardize zscore \
+    --bids-filter giga_connectome/data/test_data/bids_filter.json  \
+    --reindex-bids \
+    --calculate-intranetwork-average-correlation \
+    giga_connectome/data/test_data/ds000017-fmriprep22.0.1-downsampled-nosurface \
+    giga_connectome/data/test_data/output \
+    participant
+```
+
 ## Making a release
 
 Currently this project is not pushed to PyPi.

diff --git a/docs/source/outputs.md b/docs/source/outputs.md
@@ -1,7 +1,93 @@
 # Outputs
 
-When performing `participant` level analysis, the output is a HDF5 file per participant that was passed to `--participant_label` or all subjects under `bids_dir`.
-The output file name is: `sub-<participant_id>_atlas-<atlas_name>_desc-<denoising_strategy>.h5`
+The output of this app aims to follow the guideline
+of the [BIDS extension proposal 17 - Generic BIDS connectivity data schema](https://bids.neuroimaging.io/bep017).
 
-When performing `group` level analysis, the file will contain time series and connectomes of each subject, as well as group average connectomes. The output is a HDF5 file per participant that was passed to `--participant_label` or all subjects under `bids_dir`.
-The output file name is: `atlas-<atlas_name>_desc-<denoising_strategy>.h5`
+Metadata files content is described in this BIDS extension proposal.
+
+## Participant level
+
+For each participant that was passed to `--participant_label`
+(or all participants under `bids_dir` if no `--participant_label` is passed),
+the output will be save in `sub-<participant_id>/[ses-<ses_id>]/func`.
+
+### Data files
+
+For each input image (that is, preprocessed BOLD time series)
+and each atlas the following data files will be generated
+
+- a `[matches]_atlas-{atlas}_meas-PearsonCorrelation_desc-{atlas_description}{denoise_strategy}_relmat.tsv`
+  file that contains the correlation matrix between all the regions of the atlas
+- a `[matches]_atlas-{atlas}_meas-PearsonCorrelation_desc-{atlas description}{denoise_strategy}_timeseries.tsv`
+  file that contains the extracted timeseries for each region of the atlas
+
+- `{atlas}` refers to the name of the atlas used (for example, `Schaefer20187Networks`)
+- `{atlas_description}` refers to the sub type of atlas used (for example, `100Parcels7Networks`)
+- `{denoise_strategy}` refers to the denoise strategy passed to the command line
+
+### Metadata
+
+A JSON file is generated in the root of the output dataset (`meas-PearsonCorrelation_relmat.json`)
+that contains metadata applicable to all `relmat.tsv` files.
+
+For each input image (that is, preprocessed BOLD time series)
+a `[matches]_atlas-{atlas}_timeseries.json`
+
+### Example
+
+```
+├── dataset_description.json
+├── logs
+│   └── CITATION.md
+├── meas-PearsonCorrelation_relmat.json
+├── sub-1
+│   ├── ses-timepoint1
+│   │   └── func
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+│   │       ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+│   │       └── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+│   └── ses-timepoint2
+│       └── func
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+│           ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+│           └── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+└── sub-2
+    ├── ses-timepoint1
+    │   └── func
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+    │       ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+    │       └── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+    └── ses-timepoint2
+        └── func
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
+            ├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
+            └── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
+```
diff --git a/giga_connectome/connectome.py b/giga_connectome/connectome.py
@@ -148,7 +148,7 @@ def generate_timeseries_connectomes(
     if calculate_average_correlation:
         (
             correlation_matrix,
-            avg_intranetwork_correlation,
+            _,
         ) = calculate_intranetwork_correlation(
             correlation_matrix,
             masker_labels,

diff --git a/giga_connectome/mask.py b/giga_connectome/mask.py
@@ -19,11 +19,9 @@
 from nilearn.masking import compute_multi_epi_mask
 from scipy.ndimage import binary_closing
 
-from giga_connectome.atlas import resample_atlas_collection
+from giga_connectome.atlas import ATLAS_SETTING_TYPE, resample_atlas_collection
 from giga_connectome.logger import gc_logger
 
-from giga_connectome.atlas import ATLAS_SETTING_TYPE
-
 gc_log = gc_logger()
 
 

diff --git a/giga_connectome/methods.py b/giga_connectome/methods.py
@@ -3,7 +3,6 @@
 from pathlib import Path
 
 from jinja2 import Environment, FileSystemLoader, select_autoescape
-
 from nilearn import __version__ as nilearn_version
 from templateflow import __version__ as templateflow_version
 

diff --git a/giga_connectome/postprocess.py b/giga_connectome/postprocess.py
@@ -1,15 +1,17 @@
 from __future__ import annotations
 
+import json
 from pathlib import Path
 from typing import Any, Sequence
 
-import h5py
 import numpy as np
+import pandas as pd
 from bids.layout import BIDSImageFile
 from nilearn.connectome import ConnectivityMeasure
 from nilearn.maskers import NiftiLabelsMasker, NiftiMapsMasker
 
 from giga_connectome import utils
+from giga_connectome.atlas import ATLAS_SETTING_TYPE
 from giga_connectome.connectome import generate_timeseries_connectomes
 from giga_connectome.denoise import STRATEGY_TYPE, denoise_nifti_voxel
 from giga_connectome.logger import gc_logger
@@ -20,13 +22,13 @@
 
 def run_postprocessing_dataset(
     strategy: STRATEGY_TYPE,
+    atlas: ATLAS_SETTING_TYPE,
     resampled_atlases: Sequence[str | Path],
     images: Sequence[BIDSImageFile],
     group_mask: str | Path,
     standardize: str | bool,
     smoothing_fwhm: float,
     output_path: Path,
-    analysis_level: str = "participant",
     calculate_average_correlation: bool = False,
 ) -> None:
     """
@@ -63,6 +65,9 @@ def run_postprocessing_dataset(
     strategy : dict
         Parameters for `load_confounds_strategy` or `load_confounds`.
 
+    atlas : dict
+        Atlas settings.
+
     resampled_atlases : list of str or pathlib.Path
         Atlas niftis resampled to the common space of the dataset.
 
@@ -80,21 +85,14 @@ def run_postprocessing_dataset(
         Smoothing kernel size, passed to nilearn masker.
 
     output_path:
-        Full path to output file, named in the following format:
-        output_dir / atlas-<atlas>_desc-<strategy_name>.h5
+        Full path to output directory.
 
     analysis_level : str
         Level of analysis, only "participant" is available.
 
     calculate_average_correlation : bool
         Whether to calculate average correlation within each parcel.
     """
-    if analysis_level != "participant":
-        raise Warning(
-            "Only participant level analysis is supported. Other levels "
-            "would be ignored, and will run the participant level."
-        )
-    atlas = output_path.name.split("atlas-")[-1].split("_")[0]
     atlas_maskers: dict[str, (NiftiLabelsMasker | NiftiMapsMasker)] = {}
     connectomes: dict[str, list[np.ndarray[Any, Any]]] = {}
     for atlas_path in resampled_atlases:
@@ -109,6 +107,8 @@ def run_postprocessing_dataset(
     )
 
     # transform data
+    gc_log.info("Processing subject")
+
     with progress_bar(text="Processing subject") as progress:
         task = progress.add_task(
             description="processing subject", total=len(images)
@@ -124,15 +124,39 @@ def run_postprocessing_dataset(
             )
             # parse file name
             subject, session, specifier = utils.parse_bids_name(img.path)
-            for desc, masker in atlas_maskers.items():
-                attribute_name = (
-                    f"{subject}_{specifier}_atlas-{atlas}_desc-{desc}"
+
+            # folder for this subject output
+            connectome_path = output_path / subject
+            if session:
+                connectome_path = connectome_path / session
+            connectome_path = connectome_path / "func"
+
+            # All timeseries derivatives have the same metadata
+            # so one json file for them all.
+            # see https://bids.neuroimaging.io/bep012
+            json_filename = connectome_path / utils.output_filename(
+                source_file=Path(img.filename).stem,
+                atlas=atlas["name"],
+                suffix="timeseries",
+                extension="json",
+            )
+            utils.check_path(json_filename)
+            with open(json_filename, "w") as f:
+                json.dump(
+                    {"SamplingFrequency": 1 / img.entities["RepetitionTime"]},
+                    f,
+                    indent=4,
                 )
+
+            for desc, masker in atlas_maskers.items():
+
                 if not denoised_img:
                     time_series_atlas, correlation_matrix = None, None
-
+                    attribute_name = (
+                        f"{subject}_{specifier}"
+                        f"_atlas-{atlas['name']}_desc-{desc}"
+                    )
                     gc_log.info(f"{attribute_name}: no volume after scrubbing")
-
                     progress.update(task, advance=1)
                     continue
 
@@ -147,51 +171,36 @@ def run_postprocessing_dataset(
                     correlation_measure,
                     calculate_average_correlation,
                 )
-                connectomes[desc].append(correlation_matrix)
-
-                # dump to h5
-                flag = _set_file_flag(output_path)
-                with h5py.File(output_path, flag) as f:
-                    group = _fetch_h5_group(f, subject, session)
-                    timeseries_dset = group.create_dataset(
-                        f"{attribute_name}_timeseries", data=time_series_atlas
-                    )
-                    timeseries_dset.attrs["RepetitionTime"] = img.entities[
-                        "RepetitionTime"
-                    ]
-                    group.create_dataset(
-                        f"{attribute_name}_connectome", data=correlation_matrix
-                    )
-
-                progress.update(task, advance=1)
-
-        gc_log.info(f"Saved to:\n{output_path}")
 
+                # dump correlation_matrix to tsv
+                relmat_filename = connectome_path / utils.output_filename(
+                    source_file=Path(img.filename).stem,
+                    atlas=atlas["name"],
+                    suffix="relmat",
+                    extension="tsv",
+                    strategy=strategy["name"],
+                    desc=desc,
+                )
+                utils.check_path(relmat_filename)
+                df = pd.DataFrame(correlation_matrix)
+                df.to_csv(relmat_filename, sep="\t", index=False)
+
+                # dump timeseries to tsv file
+                timeseries_filename = connectome_path / utils.output_filename(
+                    source_file=Path(img.filename).stem,
+                    atlas=atlas["name"],
+                    suffix="timeseries",
+                    extension="tsv",
+                    strategy=strategy["name"],
+                    desc=desc,
+                )
+                utils.check_path(timeseries_filename)
+                df = pd.DataFrame(time_series_atlas)
+                df.to_csv(timeseries_filename, sep="\t", index=False)
 
-def _set_file_flag(output_path: Path) -> str:
-    """Find out if new file needs to be created."""
-    flag = "a" if output_path.exists() else "w"
-    return flag
-
+            progress.update(task, advance=1)
 
-def _fetch_h5_group(
-    file: h5py.File, subject: str, session: str | None
-) -> h5py.File | h5py.Group:
-    """Determine the level of grouping based on BIDS standard."""
-    if subject not in file:
-        return (
-            file.create_group(f"{subject}/{session}")
-            if session
-            else file.create_group(subject)
-        )
-    elif session:
-        return (
-            file[subject].create_group(session)
-            if session not in file[subject]
-            else file[f"{subject}/{session}"]
-        )
-    else:
-        return file[subject]
+    gc_log.info(f"Saved to:\n{connectome_path}")
 
 
 def _get_masker(atlas_path: Path) -> NiftiLabelsMasker | NiftiMapsMasker: