-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decisions on BIDS derivatives structure #50
Comments
This is great. Do I understand correctly that poll 2 assumes that poll 1 was already resolved and that option 2 was chosen in that poll? Do we worry that might bias poll 2 somehow? (option 2 is probably my least favorite option in poll 1, fwiw). |
Added some comments to both decisions 1 and 2, to clarify that in both instances the generation of examples necessitates assuming that one option has been selected from the other decision, but that the two decisions are independent. |
Option 5: Zipped/Tarball with complete folder Contents of file sub-01_model-abc_model.tar: sub-01_model-abc_param-x_model.nii.gz Advantages: similar to nii.gz Disadvantages: Apps would need to have capability to work with tarballs To clarify my thoughts. I wonder whether here we will feel the same. In other words, wha tis the cost of reading the .json inside the tarball? |
I think Option 4 (or possibly Option 5 with some pros and cons) is the most convenient might provide some speed-ups by allowing search of the info in the top folder name only. |
Just to see that I understand: the difference between option 4 and option 5 is that the tarball is not nested under DWI? How would we know that it's related to DWI, and discriminate it from models for FMRI or other modalities? Through the model name? |
Difference between 4 and 5 is whether the JSON corresponding to model ABC as a whole is or is not embedded within the tarball. Both reside within the |
Just clicked for me (and added to the dot points in the first post): This still really requires the more complex inheritance principle. If you read just one image (regardless of whether it's an intrinsic model output parameter or a model-derived parameter), both the contents of the paired sidecar JSON and the whole-model JSON are applicable. |
@bids-standard/maintainers: Would very much appreciate any feedback on this thread. I can't keep up with everything happening in BIDS space, so it's possible that similar issues have been encountered elsewhere; also, any decisions made here may set a precedent for many other derivatives BEPs. After feedback from maintainers, if there's no clear consensus I'd like to open up discussion to the wider community. |
@bids-maintenance We would like to make progress on this issue. We made a proposal, we would like to kindly request attention to allow us to move forward with the DWI-derivatives standard.
@PeerHerholz @soichih @bids-standard/derivatives-mri-dwi @effigies |
I find Decision 1 Option 4a appealing. Unfortunately I am not aware of the discussion that may have happened on tar balls. Maybe Chris knows, but he'll not be available for the next weeks AFAIK. For Decision 1 Option 2 you say:
how large are we talking in the worst case? Based on the discussion of revamping the Inheritance Principle, I am not very fond of Decision 1 Option 1. Re: Decision 2 --> I think one of the principles in BIDS so far was to use as few suffixes as possible, as many as needed ... so that makes Option 1 appear more favorable for me. |
I am not too familiar with BIDS structure, but I'd like to vote for option 2 (or maybe 4a..) on decision 1 for it's simplicity. I feel that BIDS is becoming too complex already with too many rules that I am not aware of.. I did write a few simple BIDS directory parser for our BIDS data importer library that implements (probably incorrectly) subset of all BIDS structure principles. I assume that I am not the only one who had to write such "broken" parsers as not everyone has access to libraries such as pyBIDS or can use them for their use cases. I also assume that the point of BIDS structure is to make the data structure simple/visible so that it can be used without using a dedicated libraries such a as pyBIDS if they wanted to, otherwise why not just make the whole structure closed within ".bids.tgz" type file format and provide canonical parsers for every programming languages? No comment on Decision 2. |
Hi folks, here are my 2 cents. Re I would either vote for Re +1 on @sappelhoff's comment! |
Consider two experiments:
The former is perhaps less "exotic".
That's useful. I've been leaning in favour of that myself, hence #46. Would be even better if anyone knows of a link to an explicit statement of such. Will give a bit more time for maintainers / developers to comment / guide / suggest alternatives, but still like the idea of a community poll. |
Contra-indication of interest in 4a: Imagine one fits a model, producing a tarball of core output model parameters, and tarballing. Now one wants to use those parameters to produce a model-derived parameter (eg. an FA map from a tensor model fit). Sidecar information relating to the model fit are still applicable to the model-derived parameter. Would one therefore be obliged to unpack the tarball, add the new file(s), and repackage? (Added to the list of disadvantages in the original post) |
as the dataset curator if you do this before finally sharing your dataset ... or when sharing a new version of the data: yes, you'd have to do that and it'd be a bit laborious. as a user of the dataset, you wouldn't want to edit the dataset anyhow, would you? Wouldn't you save your new outputs elsewhere? For example in a new (derived?) dataset, which would bring us back to the situation above, which is "a bit laborious". |
Good point. I would say yes. That would be preferable. Saved in a new derived dataset. |
Hmmm, I'm maybe here thinking of a use case outside of that intended. Sometimes I will take a dataset that's been processed using a BIDS App, and do a little bit of subsequent tweaking after the fact, eg. calculating model-derived parameters that weren't calculated by the App, and I'll try to remain vaguely BIDS-compliant when doing so. But that means that the contents no longer reflect the output of that particular App. So tarballing may make such manipulation less convenient, but maybe it's actually a good thing, as such tweaking within the purported output directory of a BIDS App should be discouraged. The outstanding question would be whether having eg. the model parameters in one derivatives directory, and model-derived parameters in a different derivatives directory. I think that as long as the validator doesn't impose any requirement on model-derived parameters coexisting with the model parameters it should be fine, but some chance I'm overlooking something. |
Originating from @oesteban Option 5: Hierarchy restricted to JSON
Contents of file
Advantages:
Disadvantages:
|
Additional suggestion from @oesteban This is described here as an augmentation of option 3; it does not in and of itself solve the complex inheritance problem.
Must be content within file Edit: |
For decision 1 is the following acceptable?:
This does not address the issue of root level model files needing to use sidecar inheritance inheritance. Also does not deal with large number of files in a directory, but should be valid in the current specification.
|
That precisely replicates the proposed "solution" as it appears in the current specification. However as I argue in bids-standard/bids-specification#1003, it is to me unintuitive, as it involves placing datatype-specific data in a directory that has the specific purpose of disambiguating datatypes. It could make more sense if alternatively " |
Closing following merge of #92. |
While I have written a lot of text in various locations regarding core decisions that need to be made regarding the definitions of filesystem paths for DWI derivatives, they may be too verbose or DWI-specific and therefore not be appropriate for widespread community engagement.
It is my intention to first post what I believe to be the viable solutions to these issues. Others are free to comment and even make alternative suggestions. Once the set of viable solutions is established, I will then construct polls to evaluate the degree of community consensus.
The example
We have a hypothetical DWI model called ABC. This model is represented using parameters X and Y. X and Y are of fundamentally different data types, such that it is not possible to store both in a single NIfTI image, and they must be split across multiple images.
For metadata, there is information that is relevant to model ABC as a whole, and there is additionally information that is specific to parameter X and parameter Y separately.
Following fitting of the model to the empirical data, it is possible to derive from X and Y another parameter of interest Z. This may in and of itself require metadata to explain how it was calculated.
Decision 1: Directory structure
(For the sake of discussion of directory structure, I will assume the existence of a new entity with key "model", and two new suffixes: "model", and "mdp" (model-derived parameter). This corresponds to decision 2, option 1 "few suffixes", but is used for demonstrative purposes in the context of decision 1 only, and the two decisions should be considered independent)
Option 1: "Complex inheritance"
Advantages:
Disadvantages:
Option 2: "No inheritance"
Advantages:
Disadvantages:
Option 3: Directory hierarchy
Advantages:
Natural exploitation of hierarchical nature of filesystem to reflect hierarchical nature of model data
Sets precedent for expanding modality directories to include sub-directories, which is a core component of TRX for tractography data and will therefore be requisite in the future
Disadvantages:
Requires modification of specification to permit sub-directories within modality directories
Breaks current implicit convention whereby sub-directory names don't bother duplicating entities corresponding to parents (eg. "
sub-01/ses-01/dwi/
"), whereas file names do (eg. "sub-01_ses-01_dwi.nii.gz
"). This is impossible to resolve as long as the JSON file and corresponding sub-directory must have the same name.Option 4: Tarballs
Option 4a: Tarball with separate JSON
Contents of file
sub-01_model-abc_model.tar
:Advantages:
Disadvantages:
BIDS Apps would need to have capability to work with tarballs (eg. unpacking and storing in scratch prior to feeding to underlying commands)
Model-derived parameters cannot be trivially added alongside the core model parameters.
PS. Apparently there's been a prior discussion regarding tarballing of non-conforming derivatives in BIDS datasets; can anyone provide a link?
Option 4b: Tarball with embedded json
Contents of file
sub-01_model-abc_model.tar
:Advantages (relative to 4a):
.img
/.hdr
file pairs)Disadvantages (relative to 4a):
Primary model sidecar information is not accessible without going into the tarball
Still requires more complex inheritance principle in a way; just it only applies to the contents of the tarball
Option 5: Hierarchy restricted to JSON
Contents of file sub-01_model-abc_model.json:
Advantages:
No complex inheritance necessary
All information relevant to a model is visible within a single file
Disadvantages:
Necessitates explicit cross-referencing between general model JSON and individual parameter files
If model-derived parameter is to be added, metadata relating to that parameter needs to be inserted into the whole-model JSON
Metadata specific to one parameter is not immediately visible via a paired JSON
Decision 2: File names
(Note that for the sake of these examples, decision 1 option 1 "complex inheritance" is utilised; this is however purely for the sake of generation of examples, and the two decisions should be considered independent)
Option 1: "Few suffixes"
"MDP": "Model-derived parameter" (exact nomenclature can be up for debate)
Advantages:
Disadvantages:
Option 2: "Many suffixes"
Advantages:
Disadvantages:
sub-01_model-xyz_model.json
above) is uncertain (and could depend on decision 1 RE: directory structure)The text was updated successfully, but these errors were encountered: