Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Move definitions, compulsory, and raw/derivatives sections to principles #40

Merged
merged 2 commits into from
Oct 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 0 additions & 186 deletions src/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,192 +42,6 @@ based on simple file formats and folder structures to reflect current lab
practices and make it accessible to a wide range of scientists coming from
different backgrounds.

## Definitions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [[RFC2119](https://www.ietf.org/rfc/rfc2119.txt)].

Throughout this protocol we use a list of terms. To avoid misunderstanding we
clarify them here.

1. Dataset - a set of neuroimaging and behavioural data acquired for a purpose
of a particular study. A dataset consists of data acquired from one or more
subjects, possibly from multiple sessions.

1. Subject - a person or animal participating in the study.

1. Session - a logical grouping of neuroimaging and behavioural data consistent
across subjects. Session can (but doesn't have to) be synonymous to a visit
in a longitudinal study. In general, subjects will stay in the scanner
during one session. However, for example, if a subject has to leave the
scanner room and then be re-positioned on the scanner bed, the set of MRI
acquisitions will still be considered as a session and match sessions
acquired in other subjects. Similarly, in situations where different data
types are obtained over several visits (for example fMRI on one day followed
by DWI the day after) those can be grouped in one session. Defining multiple
sessions is appropriate when several identical or similar data acquisitions
are planned and performed on all -or most- subjects, often in the case of
some intervention between sessions (e.g., training).

1. Data acquisition - a continuous uninterrupted block of time during which a
brain scanning instrument was acquiring data according to particular
scanning sequence/protocol.

1. Data type - a functional group of different types of data. In BIDS we define
five data types: func (task based and resting state functional MRI), dwi
(diffusion weighted imaging), fmap (field inhomogeneity mapping data such as
field maps), anat (structural imaging such as T1, T2, etc.), meg
(magnetoencephalography).

1. Task - a set of structured activities performed by the participant. Tasks
are usually accompanied by stimuli and responses, and can greatly vary in
complexity. For the purpose of this protocol we consider the so-called
“resting state” a task. In the context of brain scanning, a task is always
tied to one data acquisition. Therefore, even if during one acquisition the
subject performed multiple conceptually different behaviours (with different
sets of instructions) they will be considered one (combined) task.

1. Event - a stimulus or subject response recorded during a task. Each event
has an onset time and duration. Note that not all tasks will have recorded
events (e.g., resting state).

1. Run - an uninterrupted repetition of data acquisition that has the same
acquisition parameters and task (however events can change from run to run
due to different subject response or randomized nature of the stimuli). Run
is a synonym of a data acquisition.

## Compulsory, optional, and additional data and metadata

The following standard describes a way of arranging data and writing down
metadata for a subset of neuroimaging experiments. Some aspects of the standard
are compulsory. For example a particular file name format is required when
storing structural scans. Some aspects are regulated but optional. For example a
T2 volume does not need to be included, but when it is available it should be
saved under a particular file name specified in the standard. This standard
aspires to describe a majority of datasets, but acknowledges that there will be
cases that do not fit. In such cases one can include additional files and
subfolders to the existing folder structure following common sense. For example
one may want to include eye tracking data in a vendor specific format that is
not covered by this standard. The most sensible place to put it is next to the
continuous recording file with the same naming scheme but different extensions.
The solutions will change from case to case and publicly available datasets will
be reviewed to include common data types in the future releases of the BIDS
spec.

## Source vs. raw vs. derived data

BIDS in its current form is designed to harmonize and describe raw (unprocessed
or minimally processed due to file format conversion) data. During analysis such
data will be transformed and partial as well as final results will be saved.
Derivatives of the raw data (other than products of DICOM to NIfTI conversion)
MUST be kept separate from the raw data. This way one can protect the raw data
from accidental changes by file permissions. In addition it is easy to
distinguish partial results from the raw data and share the latter. Similar
rules apply to source data which is defined as data before harmonization and/or
file format conversion (for example E-Prime event logs or DICOM files).

This specification currently does not go into details of recommending a
particular naming scheme for including different types of source data (raw event
logs, parameter files, etc. before conversion to BIDS) and data derivatives
(correlation maps, brain masks, contrasts maps, etc.). However, in the case that
these data are to be included:

1. These data MUST be kept in separate `sourcedata` and `derivatives` folders
each with a similar folder structure as presented below for the BIDS-managed
data. For example:
`derivatives/fmriprep/sub-01/ses-pre/sub-01_ses-pre_mask.nii.gz` or
`sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
`sourcedata/sub-01/ses-pre/func/MyEvent.sce`.

1. A README file SHOULD be found at the root of the `sourcedata` or the
`derivatives` folder (or both). This file should describe the nature of the
raw data or the derived data. In the case of the existence of a
`derivatives` folder, we RECOMMEND including details about the software
stack and settings used to generate the results. Inclusion of non-imaging
objects that improve reproducibility are encouraged (scripts, settings
files, etc.).

1. We RECOMMEND including the PDF print-out with the actual sequence parameters
generated by the scanner in the `sourcedata` folder.

## The Inheritance Principle

Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any
directory level, but no more than one applicable file may be defined at a given
level (Example 1). The values from the top level are inherited by all lower
levels unless they are overridden by a file at the lower level. For example,
`sub-*_task-rest_bold.json` may be specified at the participant level, setting
TR to a specific value. If one of the runs has a different TR than the one
specified in that file, another `sub-*_task-rest_bold.json` file can be placed
within that specific series directory specifying the TR for that specific run.
There is no notion of "unsetting" a key/value pair. For example if there is a
JSON file corresponding to particular participant/run defining a key/value and
there is a JSON file on the root level of the dataset that does not define this
key/value it will not be "unset" for all subjects/runs. Files for a particular
participant can exist only at participant level directory, i.e
`/dataset/sub-*[/ses-*]/sub-*_T1w.json`. Similarly, any file that is not
specific to a participant is to be declared only at top level of dataset for eg:
`task-sist_bold.json` must be placed under `/dataset/task-sist_bold.json`

Example 1: Two JSON files at same level that are applicable for NIfTI file.

```Text
sub-01/
ses-test/
sub-test_task-overtverbgeneration_bold.json
sub-test_task-overtverbgeneration_run-2_bold.json
anat/
sub-01_ses-test_T1w.nii.gz
func/
sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz
sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz
```

In the above example, two JSON files are listed under `sub-01/ses-test/`, which
are each applicable to
`sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz`, violating the
constraint that no more than one file may be defined at a given level of the
directory structure. Instead `task-overtverbgeneration_run-2_bold.json` should
have been under `sub-01/ses-test/func/`.

Example 2: Multiple run and rec with same acquisition (acq) parameters acq-test1

```Text
sub-01/
anat/
func/
sub-01_task-xyz_acq-test1_run-1_bold.nii.gz
sub-01_task-xyz_acq-test1_run-2_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon1_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon2_bold.nii.gz
sub-01_task-xyz_acq-test1_bold.json
```

For the above example, all NIfTI files are acquired with same scanning
parameters (`acq-test1`). Hence a JSON file describing the acq parameters will
apply to different runs and rec files. Also if the JSON file
(`task-xyz_acq-test1_bold.json`) is defined at dataset top level directory, it
will be applicable to all task runs with `test1` acquisition parameter.

Case 2: Multiple json files at different levels for same task and acquisition
parameters

```Text
sub-01/
sub-01_task-xyz_acq-test1_bold.json
anat/
func/
sub-01_task-xyz_acq-test1_run-1_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon1_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon2_bold.nii.gz
```

In the above example, the fields from `task-xyz_acq-test1_bold.json` file will
apply to all bold runs. However, if there is a key with different value in
`sub-01/func/sub-01_task-xyz_acq-test1_run-1_bold.json`, the new value will be
applicable for that particular run/task NIfTI file/s.

## Extensions

The BIDS specification can be extended in a backwards compatible way and will
Expand Down
109 changes: 109 additions & 0 deletions src/02-common-principles.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,114 @@
# Common principles

## Definitions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [[RFC2119](https://www.ietf.org/rfc/rfc2119.txt)].

Throughout this protocol we use a list of terms. To avoid misunderstanding we
clarify them here.

1. Dataset - a set of neuroimaging and behavioural data acquired for a purpose
of a particular study. A dataset consists of data acquired from one or more
subjects, possibly from multiple sessions.

1. Subject - a person or animal participating in the study.

1. Session - a logical grouping of neuroimaging and behavioural data consistent
across subjects. Session can (but doesn't have to) be synonymous to a visit
in a longitudinal study. In general, subjects will stay in the scanner
during one session. However, for example, if a subject has to leave the
scanner room and then be re-positioned on the scanner bed, the set of MRI
acquisitions will still be considered as a session and match sessions
acquired in other subjects. Similarly, in situations where different data
types are obtained over several visits (for example fMRI on one day followed
by DWI the day after) those can be grouped in one session. Defining multiple
sessions is appropriate when several identical or similar data acquisitions
are planned and performed on all -or most- subjects, often in the case of
some intervention between sessions (e.g., training).

1. Data acquisition - a continuous uninterrupted block of time during which a
brain scanning instrument was acquiring data according to particular
scanning sequence/protocol.

1. Data type - a functional group of different types of data. In BIDS we define
five data types: func (task based and resting state functional MRI), dwi
(diffusion weighted imaging), fmap (field inhomogeneity mapping data such as
field maps), anat (structural imaging such as T1, T2, etc.), meg
(magnetoencephalography).

1. Task - a set of structured activities performed by the participant. Tasks
are usually accompanied by stimuli and responses, and can greatly vary in
complexity. For the purpose of this protocol we consider the so-called
“resting state” a task. In the context of brain scanning, a task is always
tied to one data acquisition. Therefore, even if during one acquisition the
subject performed multiple conceptually different behaviours (with different
sets of instructions) they will be considered one (combined) task.

1. Event - a stimulus or subject response recorded during a task. Each event
has an onset time and duration. Note that not all tasks will have recorded
events (e.g., resting state).

1. Run - an uninterrupted repetition of data acquisition that has the same
acquisition parameters and task (however events can change from run to run
due to different subject response or randomized nature of the stimuli). Run
is a synonym of a data acquisition.

## Compulsory, optional, and additional data and metadata

The following standard describes a way of arranging data and writing down
metadata for a subset of neuroimaging experiments. Some aspects of the standard
are compulsory. For example a particular file name format is required when
storing structural scans. Some aspects are regulated but optional. For example a
T2 volume does not need to be included, but when it is available it should be
saved under a particular file name specified in the standard. This standard
aspires to describe a majority of datasets, but acknowledges that there will be
cases that do not fit. In such cases one can include additional files and
subfolders to the existing folder structure following common sense. For example
one may want to include eye tracking data in a vendor specific format that is
not covered by this standard. The most sensible place to put it is next to the
continuous recording file with the same naming scheme but different extensions.
The solutions will change from case to case and publicly available datasets will
be reviewed to include common data types in the future releases of the BIDS
spec.

## Source vs. raw vs. derived data

BIDS in its current form is designed to harmonize and describe raw (unprocessed
or minimally processed due to file format conversion) data. During analysis such
data will be transformed and partial as well as final results will be saved.
Derivatives of the raw data (other than products of DICOM to NIfTI conversion)
MUST be kept separate from the raw data. This way one can protect the raw data
from accidental changes by file permissions. In addition it is easy to
distinguish partial results from the raw data and share the latter. Similar
rules apply to source data which is defined as data before harmonization and/or
file format conversion (for example E-Prime event logs or DICOM files).

This specification currently does not go into details of recommending a
particular naming scheme for including different types of source data (raw event
logs, parameter files, etc. before conversion to BIDS) and data derivatives
(correlation maps, brain masks, contrasts maps, etc.). However, in the case that
these data are to be included:

1. These data MUST be kept in separate `sourcedata` and `derivatives` folders
each with a similar folder structure as presented below for the BIDS-managed
data. For example:
`derivatives/fmriprep/sub-01/ses-pre/sub-01_ses-pre_mask.nii.gz` or
`sourcedata/sub-01/ses-pre/func/sub-01_ses-pre_task-rest_bold.dicom.tgz` or
`sourcedata/sub-01/ses-pre/func/MyEvent.sce`.

1. A README file SHOULD be found at the root of the `sourcedata` or the
`derivatives` folder (or both). This file should describe the nature of the
raw data or the derived data. In the case of the existence of a
`derivatives` folder, we RECOMMEND including details about the software
stack and settings used to generate the results. Inclusion of non-imaging
objects that improve reproducibility are encouraged (scripts, settings
files, etc.).

1. We RECOMMEND including the PDF print-out with the actual sequence parameters
generated by the scanner in the `sourcedata` folder.

## The Inheritance Principle

Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any
Expand Down