Skip to content

Commit

Permalink
[ENH] start bidsifying output (#80)
Browse files Browse the repository at this point in the history
* bidsify output

* rm tmp files

* start fixing test

* fix tests

* post merge fix

* update doc output

* update tests

* Update giga_connectome/postprocess.py

* rm tmp

* remove old data files it exists

* Update giga_connectome/utils.py

* make output to bids the default

* make bids output the default

* save correlation matrix to tsv

* revert group level

* one output file per atlas

* isort

* timeseries to tsv

* fix test and output doc

* Update .pre-commit-config.yaml

* lint

* fixes

* move to contrib
  • Loading branch information
Remi-Gau authored Mar 25, 2024
1 parent 59d8380 commit e303c9f
Show file tree
Hide file tree
Showing 11 changed files with 305 additions and 93 deletions.
1 change: 1 addition & 0 deletions docs/source/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Released MONTH YEAR

### Changes

- [ENH] Make output more BIDS compliant. (@Remi-Gau)
- [MAINT] Pin dependencies for docker build for better reproducibility. (@Remi-Gau)

## 0.4.0
Expand Down
24 changes: 24 additions & 0 deletions docs/source/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,30 @@ This tells the development team that your pull request is a "work-in-progress",

One your PR is ready a member of the development team will review your changes to confirm that they can be merged into the main codebase.

### Running the demo

You can run a demo of the bids app by downloading some test data.

Run the following from the root of the repository.

```bash
pip install tox
tox -e test_data
```

```bash
giga_connectome \
--atlas Schaefer20187Networks \
--denoise-strategy simple \
--standardize zscore \
--bids-filter giga_connectome/data/test_data/bids_filter.json \
--reindex-bids \
--calculate-intranetwork-average-correlation \
giga_connectome/data/test_data/ds000017-fmriprep22.0.1-downsampled-nosurface \
giga_connectome/data/test_data/output \
participant
```

## Making a release

Currently this project is not pushed to PyPi.
Expand Down
94 changes: 90 additions & 4 deletions docs/source/outputs.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,93 @@
# Outputs

When performing `participant` level analysis, the output is a HDF5 file per participant that was passed to `--participant_label` or all subjects under `bids_dir`.
The output file name is: `sub-<participant_id>_atlas-<atlas_name>_desc-<denoising_strategy>.h5`
The output of this app aims to follow the guideline
of the [BIDS extension proposal 17 - Generic BIDS connectivity data schema](https://bids.neuroimaging.io/bep017).

When performing `group` level analysis, the file will contain time series and connectomes of each subject, as well as group average connectomes. The output is a HDF5 file per participant that was passed to `--participant_label` or all subjects under `bids_dir`.
The output file name is: `atlas-<atlas_name>_desc-<denoising_strategy>.h5`
Metadata files content is described in this BIDS extension proposal.

## Participant level

For each participant that was passed to `--participant_label`
(or all participants under `bids_dir` if no `--participant_label` is passed),
the output will be save in `sub-<participant_id>/[ses-<ses_id>]/func`.

### Data files

For each input image (that is, preprocessed BOLD time series)
and each atlas the following data files will be generated

- a `[matches]_atlas-{atlas}_meas-PearsonCorrelation_desc-{atlas_description}{denoise_strategy}_relmat.tsv`
file that contains the correlation matrix between all the regions of the atlas
- a `[matches]_atlas-{atlas}_meas-PearsonCorrelation_desc-{atlas description}{denoise_strategy}_timeseries.tsv`
file that contains the extracted timeseries for each region of the atlas

- `{atlas}` refers to the name of the atlas used (for example, `Schaefer20187Networks`)
- `{atlas_description}` refers to the sub type of atlas used (for example, `100Parcels7Networks`)
- `{denoise_strategy}` refers to the denoise strategy passed to the command line

### Metadata

A JSON file is generated in the root of the output dataset (`meas-PearsonCorrelation_relmat.json`)
that contains metadata applicable to all `relmat.tsv` files.

For each input image (that is, preprocessed BOLD time series)
a `[matches]_atlas-{atlas}_timeseries.json`

### Example

```
├── dataset_description.json
├── logs
│   └── CITATION.md
├── meas-PearsonCorrelation_relmat.json
├── sub-1
│   ├── ses-timepoint1
│   │   └── func
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   │   └── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   └── ses-timepoint2
│   └── func
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   └── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
└── sub-2
├── ses-timepoint1
│   └── func
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   └── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
└── ses-timepoint2
└── func
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
└── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
```
2 changes: 1 addition & 1 deletion giga_connectome/connectome.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def generate_timeseries_connectomes(
if calculate_average_correlation:
(
correlation_matrix,
avg_intranetwork_correlation,
_,
) = calculate_intranetwork_correlation(
correlation_matrix,
masker_labels,
Expand Down
4 changes: 1 addition & 3 deletions giga_connectome/mask.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,9 @@
from nilearn.masking import compute_multi_epi_mask
from scipy.ndimage import binary_closing

from giga_connectome.atlas import resample_atlas_collection
from giga_connectome.atlas import ATLAS_SETTING_TYPE, resample_atlas_collection
from giga_connectome.logger import gc_logger

from giga_connectome.atlas import ATLAS_SETTING_TYPE

gc_log = gc_logger()


Expand Down
1 change: 0 additions & 1 deletion giga_connectome/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
from pathlib import Path

from jinja2 import Environment, FileSystemLoader, select_autoescape

from nilearn import __version__ as nilearn_version
from templateflow import __version__ as templateflow_version

Expand Down
123 changes: 66 additions & 57 deletions giga_connectome/postprocess.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
from __future__ import annotations

import json
from pathlib import Path
from typing import Any, Sequence

import h5py
import numpy as np
import pandas as pd
from bids.layout import BIDSImageFile
from nilearn.connectome import ConnectivityMeasure
from nilearn.maskers import NiftiLabelsMasker, NiftiMapsMasker

from giga_connectome import utils
from giga_connectome.atlas import ATLAS_SETTING_TYPE
from giga_connectome.connectome import generate_timeseries_connectomes
from giga_connectome.denoise import STRATEGY_TYPE, denoise_nifti_voxel
from giga_connectome.logger import gc_logger
Expand All @@ -20,13 +22,13 @@

def run_postprocessing_dataset(
strategy: STRATEGY_TYPE,
atlas: ATLAS_SETTING_TYPE,
resampled_atlases: Sequence[str | Path],
images: Sequence[BIDSImageFile],
group_mask: str | Path,
standardize: str | bool,
smoothing_fwhm: float,
output_path: Path,
analysis_level: str = "participant",
calculate_average_correlation: bool = False,
) -> None:
"""
Expand Down Expand Up @@ -63,6 +65,9 @@ def run_postprocessing_dataset(
strategy : dict
Parameters for `load_confounds_strategy` or `load_confounds`.
atlas : dict
Atlas settings.
resampled_atlases : list of str or pathlib.Path
Atlas niftis resampled to the common space of the dataset.
Expand All @@ -80,21 +85,14 @@ def run_postprocessing_dataset(
Smoothing kernel size, passed to nilearn masker.
output_path:
Full path to output file, named in the following format:
output_dir / atlas-<atlas>_desc-<strategy_name>.h5
Full path to output directory.
analysis_level : str
Level of analysis, only "participant" is available.
calculate_average_correlation : bool
Whether to calculate average correlation within each parcel.
"""
if analysis_level != "participant":
raise Warning(
"Only participant level analysis is supported. Other levels "
"would be ignored, and will run the participant level."
)
atlas = output_path.name.split("atlas-")[-1].split("_")[0]
atlas_maskers: dict[str, (NiftiLabelsMasker | NiftiMapsMasker)] = {}
connectomes: dict[str, list[np.ndarray[Any, Any]]] = {}
for atlas_path in resampled_atlases:
Expand All @@ -109,6 +107,8 @@ def run_postprocessing_dataset(
)

# transform data
gc_log.info("Processing subject")

with progress_bar(text="Processing subject") as progress:
task = progress.add_task(
description="processing subject", total=len(images)
Expand All @@ -124,15 +124,39 @@ def run_postprocessing_dataset(
)
# parse file name
subject, session, specifier = utils.parse_bids_name(img.path)
for desc, masker in atlas_maskers.items():
attribute_name = (
f"{subject}_{specifier}_atlas-{atlas}_desc-{desc}"

# folder for this subject output
connectome_path = output_path / subject
if session:
connectome_path = connectome_path / session
connectome_path = connectome_path / "func"

# All timeseries derivatives have the same metadata
# so one json file for them all.
# see https://bids.neuroimaging.io/bep012
json_filename = connectome_path / utils.output_filename(
source_file=Path(img.filename).stem,
atlas=atlas["name"],
suffix="timeseries",
extension="json",
)
utils.check_path(json_filename)
with open(json_filename, "w") as f:
json.dump(
{"SamplingFrequency": 1 / img.entities["RepetitionTime"]},
f,
indent=4,
)

for desc, masker in atlas_maskers.items():

if not denoised_img:
time_series_atlas, correlation_matrix = None, None

attribute_name = (
f"{subject}_{specifier}"
f"_atlas-{atlas['name']}_desc-{desc}"
)
gc_log.info(f"{attribute_name}: no volume after scrubbing")

progress.update(task, advance=1)
continue

Expand All @@ -147,51 +171,36 @@ def run_postprocessing_dataset(
correlation_measure,
calculate_average_correlation,
)
connectomes[desc].append(correlation_matrix)

# dump to h5
flag = _set_file_flag(output_path)
with h5py.File(output_path, flag) as f:
group = _fetch_h5_group(f, subject, session)
timeseries_dset = group.create_dataset(
f"{attribute_name}_timeseries", data=time_series_atlas
)
timeseries_dset.attrs["RepetitionTime"] = img.entities[
"RepetitionTime"
]
group.create_dataset(
f"{attribute_name}_connectome", data=correlation_matrix
)

progress.update(task, advance=1)

gc_log.info(f"Saved to:\n{output_path}")

# dump correlation_matrix to tsv
relmat_filename = connectome_path / utils.output_filename(
source_file=Path(img.filename).stem,
atlas=atlas["name"],
suffix="relmat",
extension="tsv",
strategy=strategy["name"],
desc=desc,
)
utils.check_path(relmat_filename)
df = pd.DataFrame(correlation_matrix)
df.to_csv(relmat_filename, sep="\t", index=False)

# dump timeseries to tsv file
timeseries_filename = connectome_path / utils.output_filename(
source_file=Path(img.filename).stem,
atlas=atlas["name"],
suffix="timeseries",
extension="tsv",
strategy=strategy["name"],
desc=desc,
)
utils.check_path(timeseries_filename)
df = pd.DataFrame(time_series_atlas)
df.to_csv(timeseries_filename, sep="\t", index=False)

def _set_file_flag(output_path: Path) -> str:
"""Find out if new file needs to be created."""
flag = "a" if output_path.exists() else "w"
return flag

progress.update(task, advance=1)

def _fetch_h5_group(
file: h5py.File, subject: str, session: str | None
) -> h5py.File | h5py.Group:
"""Determine the level of grouping based on BIDS standard."""
if subject not in file:
return (
file.create_group(f"{subject}/{session}")
if session
else file.create_group(subject)
)
elif session:
return (
file[subject].create_group(session)
if session not in file[subject]
else file[f"{subject}/{session}"]
)
else:
return file[subject]
gc_log.info(f"Saved to:\n{connectome_path}")


def _get_masker(atlas_path: Path) -> NiftiLabelsMasker | NiftiMapsMasker:
Expand Down
Loading

0 comments on commit e303c9f

Please sign in to comment.