Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add res and den keywords to indicate resolution of resampled data #301

Merged
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
dda7aca
fix: force-push a rewrite addressing minimal changes
oesteban Aug 13, 2019
e10f451
sty: make linter happy
oesteban Aug 13, 2019
3a85c0a
[STY] Add missing table fence
oesteban Aug 15, 2019
5ec2bf6
fix: force-push a rewrite addressing minimal changes
oesteban Aug 13, 2019
a428a91
sty: make linter happy
oesteban Aug 13, 2019
542ad99
Update src/05-derivatives/02-common-data-types.md
oesteban Aug 15, 2019
9e4e384
Address @effigies' and @edickie's comments
oesteban Aug 15, 2019
c3213e6
sty: fix missaligned table fence
oesteban Aug 15, 2019
9cdeea8
fix: better description for example following @edickie's comment
oesteban Aug 15, 2019
b071ebb
More generalizable concepts (addressing @robertoostenveld's comments)
oesteban Aug 18, 2019
ea3183a
Update src/05-derivatives/02-common-data-types.md
oesteban Aug 19, 2019
6f21533
Address @robertoostenveld's comment
oesteban Aug 19, 2019
68c9981
Merge branch 'enh/resolution-density' of github.com:oesteban/bids-spe…
oesteban Aug 30, 2019
13b1d76
Merge remote-tracking branch 'upstream/common-derivatives' into enh/r…
oesteban Aug 30, 2019
46d577c
fix: ``res-<index>`` -> ``res-<label>`` + admonitions
oesteban Aug 30, 2019
c8dd017
fix: cover one @effigies' suggestion
oesteban Aug 30, 2019
7df0899
remove unnecessary edit
oesteban Aug 30, 2019
0bbef9d
fix: example with bad formatting
oesteban Aug 30, 2019
501c889
Merge remote-tracking branch 'upstream/common-derivatives' into enh/r…
effigies Nov 21, 2019
c56fad6
RF: Move res/den keywords to imaging derivatives
effigies Dec 4, 2019
785c8cd
REVERT Removal of example from Common Data Types
effigies Dec 9, 2019
d484416
Drop tip that does not match context
effigies Dec 9, 2019
8f834c7
STY: Satisfy linter
effigies Dec 9, 2019
591f0a5
Merge remote-tracking branch 'upstream/common-derivatives' into enh/r…
oesteban Jan 28, 2020
4002ab4
enh: include @robertoostenveld's suggestion
oesteban Jan 28, 2020
0b99b7c
ENH: Add SpatialReference dictionary option and example
effigies May 21, 2020
bb94c0b
Merge remote-tracking branch 'upstream/common-derivatives' into enh/r…
effigies May 21, 2020
f60de90
STY: Drop parens
effigies May 21, 2020
e7f6341
Allow resolution/density to be a string or dictionary
effigies May 21, 2020
b874e08
Eliminate dtseries.json example, add space-fsLR to avoid ambiguity
effigies May 21, 2020
d518933
Update src/05-derivatives/01-introduction.md
effigies May 21, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/05-derivatives/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ denotes a key in a subdictionary):
| PipelineDescription.Name | REQUIRED. Name of the pipeline that generated the outputs. In case the derived dataset is stored as a subfolder of the raw dataset this field MUST be a substring of the derived dataset folder name (a.k.a. `<pipeline_name>` - see above). |
| PipelineDescription.Version | OPTIONAL. Version of the pipeline. |
| PipelineDescription.CodeURL | OPTIONAL. URL where the code for the analysis can be found. |
| PipelineDescription.Container | OPTIONAL. Object specifying the location and relevant attributes of software container image used to produce the derivative. Valid fields in this object include `Type`, `Tag` and `URI`.
| PipelineDescription.Container | OPTIONAL. Object specifying the location and relevant attributes of software container image used to produce the derivative. Valid fields in this object include `Type`, `Tag` and `URI`. |
| SourceDatasets | OPTIONAL. A list of objects specifying the locations and relevant attributes of all source datasets. Valid fields in each object include `URL`, `DOI`, and `Version`. |

Example:
Expand Down
62 changes: 56 additions & 6 deletions src/05-derivatives/02-common-data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Template:
<pipeline_name>/
sub-<participant_label>/
anat|func|dwi/
<source_keywords>[_space-<space>][_desc-<label>]_<suffix>.<ext>
<source_keywords>[_space-<space>][_res-<index>][_den-<label>][_desc-<label>]_<suffix>.<ext>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices thing feels weird, as opposed to a (potentially) descriptive label, given that the indices don't mean anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below, indices allow for the developer to impose some explicit order they want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the index made explicit, i.e. where is the lookup-table that allows me to see what index value "1" signifies? Is the lookup table a TSV or a JSON?

Is the label in den-label free to choose by the researcher, or is it limited to a controlled vocabulary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the index look-up is in the sidecar, specifically here.
On my reading, it seems that den is free text (rather than a controlled vocabulary). I'm not sure what the controlled vocabulary would be, here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @emdupre, that clarifies.

I think it would be better if both res and den would allow strings rather than digits. This allows them to carry explicit meaning (like "1mm", "2mm"). Digits to indicate indices suffer from some ambiguities, i.e. should they start Python/C style 0-offset or MATLAB/Fortran style 1-offset? And is it allowed to use them in non-successive order like 1, 3, 5, 7? The developer can (and probably should) always start by reading the JSON, and subsequently loop over the keys of the dictionary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit more about this (and similar ideas from @robertoostenveld above) - would you agree in relaxing the value of res to be a <label> (and hence, consistent with den-, which is nice), but including the recommendation to use one-based, consecutive indexes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still missing what the 1-based consecutive index recommendation really helps with, given that labels can be arbitrary. and how would you differentiate across datasets that follow the recommendation and those that don't?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still missing what the 1-based consecutive index recommendation really helps with, given that labels can be arbitrary.

  1. Gives you an idea of how many different resolutions there are with a quick look. Other options don't.
  2. Allows the developers to impose an order they might be interested in (e.g., higher resolution first).
  3. Preempts inconsistency between the label and the data (e.g., having res-333mm for a NIfTI with 2x2x2 or that has an affine with shearing)

how would you differentiate across datasets that follow the recommendation and those that don't?

The recommendation does not allow you to remove the metadata in the JSON. Other than that, why would you need to know whether the dataset sticks with the recommendation or not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm - I'm a little confused - I also believe that it should be a res-<label> and not res-<index>.

I simple alphabetical sort of labels is probably all the ordering we would need?

I really hate the idea of res "2x2x2mm" being associated with index 3...that seems way too confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the PR and now it proposes res-<label>. However, I left a recommendation indicating the use of indexes. The question is whether that should be removed or not. I could also expand the recommendation box in the docs to include the three reasons I gave above.

oesteban marked this conversation as resolved.
Show resolved Hide resolved
```

Processing in this context means transformations of data that does not change
Expand All @@ -30,13 +30,22 @@ Note that even though `space` and `desc` are
optional at least one of them needs to be defined to avoid name conflict with
the raw file.

**Sampling**. When two or more instances of a given derivative are provided with resolution
(or surface sampling density) being the only difference between them, then the `res`
(for *resolution* of regularly sampled N-D data) and/or `den` (for *density* of non-parametric
surfaces) SHOULD be used to avoid name conflicts.
Note that only files combining both regularly sampled (e.g., gridded) and surface sampled data
(and their downstream derivatives) are allowed to present both `res` and `den` keywords
simultaneously.

Examples:

```Text
pipeline1/
sub-001/
func/
sub-001_task-rest_run-1_space-MNI305_bold.nii.gz
sub-001_task-rest_run-1_space-MNI305_res-1_bold.nii.gz
sub-001_task-rest_run-1_space-MNI305_res-2_bold.nii.gz
sub-001_task-rest_run-1_space-MNI305_bold.json
```

effigies marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -62,9 +71,50 @@ makes them invalid (e.g., if a source 4D image is averaged to create a single
static volume, a SamplingFrequency property would no longer be relevant). In
addition, all processed files include the following metadata JSON fields:

| **Key name** | **Description** |
| ------------- | ----------------------------------------------------------------------------------------------- |
| SkullStripped | REQUIRED. Boolean. Whether the volume was skull stripped (non-brain voxels set to zero) or not. |
| **Key name** | **Description** |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| SkullStripped | REQUIRED. Boolean. Whether the volume was skull stripped (non-brain voxels set to zero) or not. |
| Resolution | REQUIRED if `res` keyword present. Dictionary of Indexes to Strings. Specifies the interpretation of the resolution keyword. |
| Density | REQUIRED if `den` keyword present. Dictionary of Strings to Strings. Specifies the interpretation of the resolution keyword. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our usual way to disambiguate entities is something like:

res-1_bold.json:

{
  "Resolution": "abc"
}

res-2_bold.json:

{
  "Resolution": "xyz"
}

The switch to a dictionary indexed by strings doesn't appear to be justified anywhere in the proposed text. And for resolution, if it's an index, should it not be an integer, rather than a string of digits?

Copy link
Collaborator Author

@oesteban oesteban Aug 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy to elaborate more on this:

  • Why an index: to allow developers to impose an explicit ordering. However, I would agree with alternatives such as res-high, res-low, res-mid... but that vocabulary would be very limited. I wouldn't allow res-2x2x2 for various reasons. First is that very often we will need decimals and res-07x07x07 seems like a recipe for disaster. Second, file formats generally address this issue, so it gives plenty of room for the discrepancy between this value and what you find in the file's header. To me, the index solution seemed easy.

    However, to allow human-readable descriptions in the JSON file, they need to be converted to string indexes for the dictionary (JSON does not allow integer keys, does it?).

    I can add your metadata proposal (i.e., not using a dict because the relationship between JSON and object is univocal), but I would not eliminate the dictionary option because that allows you to place the metadata at a higher level of the hierarchy (ie., describe res-1 and res-2 in the same, unique file).

  • I like the idea of free-form descriptions of each index. The rationale is that - software should read the actual resolution (and units, which seem to be of concern) from the file format. Humans, however, might check quickly this description field without the need of opening the file (e.g., nib-ls for niftis with nibabel). I would be very happy to change my mind if a better idea is proposed, or improving this PR with a better description of the current proposal - which seems necessary at this point.

oesteban marked this conversation as resolved.
Show resolved Hide resolved

Example JSON file corresponding to `pipeline1/sub-001/func/sub-001_task-rest_run-1_space-MNI305_bold.json` above:

```JSON
{
"SkullStripped": true,
"Resolution": {
"1": "Matched with original BOLD resolution (2x2x3 mm^3)",
"2": "Matched with high-resolution T1w (0.7mm, isotropic)"
}
}
```

Example of CIFTI files (i.e., a file that combines regularly sampled data
and non-parametric surfaces) having both `res` and `den` keywords:

```Text
pipeline1/
sub-001/
func/
sub-001_task-rest_run-1_res-1_den-10k_bold.dtseries.nii
sub-001_task-rest_run-1_res-1_den-41k_bold.dtseries.nii
sub-001_task-rest_run-1_bold.dtseries.json
```

And the corresponding `sub-001_task-rest_run-1_bold.dtseries.json` file:

```JSON
{
"SkullStripped": true,
"Resolution": {
"1": "Matched with MNI152NLin6Asym 1.6mm isotropic"
},
"Density": {
"10k": "10242 vertices per hemisphere (5th order icosahedron)",
"41k": "40962 vertices per hemisphere (6th order icosahedron)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just like these say 10k and 41k someone could have used 1p6 and 2p0 for Resolution as well, right? or conversely i could have used 1 and 2 for Density.

btw, this may be the first use of a dictionary as a value in bids. i know this was discussed a long time back in the context of pybids, and in our derivatives meeting to stay away from, so may be worth a little ping to bep leads and others.

Copy link
Collaborator Author

@oesteban oesteban Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just like these say 10k and 41k someone could have used 1p6 and 2p0 for Resolution as well, right? or conversely i could have used 1 and 2 for Density.

Sure

btw, this may be the first use of a dictionary as a value in bids. i know this was discussed a long time back in the context of pybids, and in our derivatives meeting to stay away from, so may be worth a little ping to bep leads and others.

I just added a suggestion from @effigies that would provide an alternative to dicts. However, it seems useful to have all resolutions in just one place.

}
}
```

## Masks

Expand All @@ -84,7 +134,7 @@ be set to `orig`.
JSON metadata fields:

| **Key name** | **Description** |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| ------------ |----------------------------------------------------------------------------------------------------------------------------------------------- |
| RawSources | Same as defined in [Introduction](01-introduction.md), but elevated from OPTIONAL to REQUIRED |
| Type | RECOMMENDED. Short identifier of the mask. Reserved values: `Brain` - brain mask, `Lesion` - lesion mask, `Face` - face mask, `ROI` - ROI mask |

Expand Down