Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] start bidsifying output #80

Merged
merged 28 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
547d888
bidsify output
Remi-Gau Dec 14, 2023
4649ea1
rm tmp files
Remi-Gau Dec 14, 2023
ef48177
start fixing test
Remi-Gau Dec 14, 2023
614b8a8
fix tests
Remi-Gau Dec 14, 2023
e3a36ca
Merge remote-tracking branch 'upstream/main' into folder
Remi-Gau Dec 14, 2023
14415ef
post merge fix
Remi-Gau Dec 14, 2023
6eb1126
update doc output
Remi-Gau Dec 14, 2023
e91e91b
update tests
Remi-Gau Dec 14, 2023
15441d3
Update giga_connectome/postprocess.py
Remi-Gau Dec 14, 2023
ec0a33d
rm tmp
Remi-Gau Dec 14, 2023
3c0e72e
Merge remote-tracking branch 'upstream/main' into folder
Remi-Gau Dec 15, 2023
3b12bf5
remove old data files it exists
Remi-Gau Dec 15, 2023
2538c43
Update giga_connectome/utils.py
Remi-Gau Jan 8, 2024
3eb3061
make output to bids the default
Remi-Gau Mar 25, 2024
3e8570c
make bids output the default
Remi-Gau Mar 25, 2024
21fa3db
save correlation matrix to tsv
Remi-Gau Mar 25, 2024
0d5db0b
revert group level
Remi-Gau Mar 25, 2024
5224759
one output file per atlas
Remi-Gau Mar 25, 2024
49cbf3a
Merge remote-tracking branch 'upstream/main' into folder
Remi-Gau Mar 25, 2024
2100612
Merge remote-tracking branch 'upstream/main' into folder
Remi-Gau Mar 25, 2024
c82f3d2
isort
Remi-Gau Mar 25, 2024
7eb15f6
timeseries to tsv
Remi-Gau Mar 25, 2024
405c6ac
fix test and output doc
Remi-Gau Mar 25, 2024
e4a88c9
Merge branch 'main' into folder
Remi-Gau Mar 25, 2024
c463fdd
Update .pre-commit-config.yaml
Remi-Gau Mar 25, 2024
784d9b5
lint
Remi-Gau Mar 25, 2024
808dd65
fixes
Remi-Gau Mar 25, 2024
29efb3e
move to contrib
Remi-Gau Mar 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ repos:
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
args: ['--maxkb=1000']
Remi-Gau marked this conversation as resolved.
Show resolved Hide resolved
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.7.1
hooks:
Expand Down
1 change: 1 addition & 0 deletions docs/source/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Released MONTH YEAR

### Changes

- [ENH] Make output more BIDS compliant. (@Remi-Gau)
- [MAINT] Pin dependencies for docker build for better reproducibility. (@Remi-Gau)

## 0.4.0
Expand Down
94 changes: 90 additions & 4 deletions docs/source/outputs.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,93 @@
# Outputs

When performing `participant` level analysis, the output is a HDF5 file per participant that was passed to `--participant_label` or all subjects under `bids_dir`.
The output file name is: `sub-<participant_id>_atlas-<atlas_name>_desc-<denoising_strategy>.h5`
The output of this app aims to follow the guideline
of the [BIDS extension proposal 17 - Generic BIDS connectivity data schema](https://bids.neuroimaging.io/bep017).

When performing `group` level analysis, the file will contain time series and connectomes of each subject, as well as group average connectomes. The output is a HDF5 file per participant that was passed to `--participant_label` or all subjects under `bids_dir`.
The output file name is: `atlas-<atlas_name>_desc-<denoising_strategy>.h5`
Metadata files content is described in this BIDS extension proposal.

## Participant level

For each participant that was passed to `--participant_label`
(or all participants under `bids_dir` if no `--participant_label` is passed),
the output will be save in `sub-<participant_id>/[ses-<ses_id>]/func`.

### Data files

For each input image (that is, preprocessed BOLD time series)
and each atlas the following data files will be generated

- a `[matches]_atlas-{atlas}_meas-PearsonCorrelation_desc-{atlas description}{strategy}_relmat.tsv`
file that contains the correlation matrix between all the regions of the atlas
- a `[matches]_atlas-{atlas}_meas-PearsonCorrelation_desc-{atlas description}{strategy}_timeseries.tsv`
file that contains the extracted timeseries for each region of the atlas

- `{atlas}` refers to the name of the atlas used (for example, `Schaefer20187Networks`)
- `{atlas description}` refers to the sub type of atlas used (for example, `100Parcels7Networks`)
- `{strategy}` refers to strategy passed to the command line

### Metadata

A JSON file is generated in the root of the output dataset (`meas-PearsonCorrelation_relmat.json`)
that contains metadata applicable to all `relmat.tsv` files.

For each input image (that is, preprocessed BOLD time series)
a `[matches]_atlas-{atlas}_timeseries.json`

### Example

```
├── dataset_description.json
├── logs
│   └── CITATION.md
├── meas-PearsonCorrelation_relmat.json
├── sub-1
│   ├── ses-timepoint1
│   │   └── func
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   │   ├── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   │   └── sub-1_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   └── ses-timepoint2
│   └── func
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   └── sub-1_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
└── sub-2
├── ses-timepoint1
│   └── func
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
│   ├── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
│   └── sub-2_ses-timepoint1_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
└── ses-timepoint2
└── func
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-01_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-100Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_desc-200Parcels7NetworksSimple_timeseries.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-100Parcels7NetworksSimple_relmat.tsv
├── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_meas-PearsonCorrelation_desc-200Parcels7NetworksSimple_relmat.tsv
└── sub-2_ses-timepoint2_task-probabilisticclassification_run-02_space-MNI152NLin2009cAsym_res-2_atlas-Schaefer20187Networks_timeseries.json
```
25 changes: 25 additions & 0 deletions docs/source/usage.md
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit extra to this PR but it felt like a minimum description on how to run a demo was necessary

Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,28 @@ In a `json` file, define the customised atlas. We will use the atlas above as an
```

See examples in [`giga_connectome/data`](https://github.com/SIMEXP/giga_connectome/tree/main/giga_connectome/data).


## Example

You can run a demo of the bids app by downloading some test data.

Run the following from the root of the repository.

```bash
pip install tox
tox -e test_data
```

```bash
giga_connectome \
--atlas Schaefer20187Networks \
--denoise-strategy simple \
--standardize zscore \
--bids-filter giga_connectome/data/test_data/bids_filter.json \
--reindex-bids \
--calculate-intranetwork-average-correlation \
giga_connectome/data/test_data/ds000017-fmriprep22.0.1-downsampled-nosurface \
giga_connectome/data/test_data/output \
participant
```
2 changes: 1 addition & 1 deletion giga_connectome/connectome.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def generate_timeseries_connectomes(
if calculate_average_correlation:
(
correlation_matrix,
avg_intranetwork_correlation,
_,
) = calculate_intranetwork_correlation(
correlation_matrix,
masker_labels,
Expand Down
4 changes: 1 addition & 3 deletions giga_connectome/mask.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,9 @@
from nilearn.masking import compute_multi_epi_mask
from scipy.ndimage import binary_closing

from giga_connectome.atlas import resample_atlas_collection
from giga_connectome.atlas import ATLAS_SETTING_TYPE, resample_atlas_collection
from giga_connectome.logger import gc_logger

from giga_connectome.atlas import ATLAS_SETTING_TYPE

gc_log = gc_logger()


Expand Down
1 change: 0 additions & 1 deletion giga_connectome/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
from pathlib import Path

from jinja2 import Environment, FileSystemLoader, select_autoescape

from nilearn import __version__ as nilearn_version
from templateflow import __version__ as templateflow_version

Expand Down
123 changes: 67 additions & 56 deletions giga_connectome/postprocess.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
from __future__ import annotations

import json
from pathlib import Path
from typing import Any, Sequence

import h5py
import numpy as np
import pandas as pd
from bids.layout import BIDSImageFile
from nilearn.connectome import ConnectivityMeasure
from nilearn.maskers import NiftiLabelsMasker, NiftiMapsMasker

from giga_connectome import utils
from giga_connectome.atlas import ATLAS_SETTING_TYPE
from giga_connectome.connectome import generate_timeseries_connectomes
from giga_connectome.denoise import STRATEGY_TYPE, denoise_nifti_voxel
from giga_connectome.logger import gc_logger
Expand All @@ -20,13 +22,13 @@

def run_postprocessing_dataset(
strategy: STRATEGY_TYPE,
atlas: ATLAS_SETTING_TYPE,
resampled_atlases: Sequence[str | Path],
images: Sequence[BIDSImageFile],
group_mask: str | Path,
standardize: str | bool,
smoothing_fwhm: float,
output_path: Path,
analysis_level: str = "participant",
calculate_average_correlation: bool = False,
) -> None:
"""
Expand Down Expand Up @@ -63,6 +65,9 @@ def run_postprocessing_dataset(
strategy : dict
Parameters for `load_confounds_strategy` or `load_confounds`.

atlas : dict
Atlas settings.

resampled_atlases : list of str or pathlib.Path
Atlas niftis resampled to the common space of the dataset.

Expand All @@ -80,21 +85,14 @@ def run_postprocessing_dataset(
Smoothing kernel size, passed to nilearn masker.

output_path:
Full path to output file, named in the following format:
output_dir / atlas-<atlas>_desc-<strategy_name>.h5
Full path to output directory.

analysis_level : str
Level of analysis, only "participant" is available.

calculate_average_correlation : bool
Whether to calculate average correlation within each parcel.
"""
if analysis_level != "participant":
raise Warning(
"Only participant level analysis is supported. Other levels "
"would be ignored, and will run the participant level."
)
Comment on lines -92 to -96
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realizing I just completely removed this from the code

but I feel that given that this is group level is not allowed by the CLI this warning will never be hit in practice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense

atlas = output_path.name.split("atlas-")[-1].split("_")[0]
atlas_maskers: dict[str, (NiftiLabelsMasker | NiftiMapsMasker)] = {}
connectomes: dict[str, list[np.ndarray[Any, Any]]] = {}
for atlas_path in resampled_atlases:
Expand All @@ -109,6 +107,8 @@ def run_postprocessing_dataset(
)

# transform data
gc_log.info("Processing subject")

with progress_bar(text="Processing subject") as progress:
task = progress.add_task(
description="processing subject", total=len(images)
Expand All @@ -124,9 +124,33 @@ def run_postprocessing_dataset(
)
# parse file name
subject, session, specifier = utils.parse_bids_name(img.path)

# folder for this subject output
connectome_path = output_path / subject
if session:
connectome_path = connectome_path / session
connectome_path = connectome_path / "func"

# All timeseries derivatives have the same metadata
# so one json file for them all.
# see https://bids.neuroimaging.io/bep012
json_filename = connectome_path / utils.output_filename(
source_file=Path(img.filename).stem,
atlas=atlas["name"],
suffix="timeseries",
extension="json",
)
utils.check_path(json_filename)
with open(json_filename, "w") as f:
json.dump(
{"SamplingFrequency": 1 / img.entities["RepetitionTime"]},
f,
indent=4,
)

for desc, masker in atlas_maskers.items():
attribute_name = (
f"{subject}_{specifier}_atlas-{atlas}_desc-{desc}"
f"{subject}_{specifier}_atlas-{atlas['name']}_desc-{desc}"
)
if not denoised_img:
time_series_atlas, correlation_matrix = None, None
Expand All @@ -147,51 +171,38 @@ def run_postprocessing_dataset(
correlation_measure,
calculate_average_correlation,
)
connectomes[desc].append(correlation_matrix)

# dump to h5
flag = _set_file_flag(output_path)
with h5py.File(output_path, flag) as f:
group = _fetch_h5_group(f, subject, session)
timeseries_dset = group.create_dataset(
f"{attribute_name}_timeseries", data=time_series_atlas
)
timeseries_dset.attrs["RepetitionTime"] = img.entities[
"RepetitionTime"
]
group.create_dataset(
f"{attribute_name}_connectome", data=correlation_matrix
)

progress.update(task, advance=1)

gc_log.info(f"Saved to:\n{output_path}")


def _set_file_flag(output_path: Path) -> str:
"""Find out if new file needs to be created."""
flag = "a" if output_path.exists() else "w"
return flag


def _fetch_h5_group(
file: h5py.File, subject: str, session: str | None
) -> h5py.File | h5py.Group:
"""Determine the level of grouping based on BIDS standard."""
if subject not in file:
return (
file.create_group(f"{subject}/{session}")
if session
else file.create_group(subject)
)
elif session:
return (
file[subject].create_group(session)
if session not in file[subject]
else file[f"{subject}/{session}"]
)
else:
return file[subject]

# one output file per input file per atlas

# dump correlation_matrix to tsv
relmat_filename = connectome_path / utils.output_filename(
source_file=Path(img.filename).stem,
atlas=atlas["name"],
suffix="relmat",
extension="tsv",
strategy=strategy["name"],
desc=desc,
)
utils.check_path(relmat_filename)
df = pd.DataFrame(correlation_matrix)
df.to_csv(relmat_filename, sep="\t", index=False)

# dump timeseries to h5
timeseries_filename = connectome_path / utils.output_filename(
source_file=Path(img.filename).stem,
atlas=atlas["name"],
suffix="timeseries",
extension="tsv",
strategy=strategy["name"],
desc=desc,
)
utils.check_path(timeseries_filename)
df = pd.DataFrame(time_series_atlas)
df.to_csv(timeseries_filename, sep="\t", index=False)

progress.update(task, advance=1)

gc_log.info(f"Saved to:\n{connectome_path}")


def _get_masker(atlas_path: Path) -> NiftiLabelsMasker | NiftiMapsMasker:
Expand Down
Loading