Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FOUNDATIONAL] Facilitate small atomic enhancements #371

Open
yarikoptic opened this issue Nov 13, 2019 · 10 comments
Open

[FOUNDATIONAL] Facilitate small atomic enhancements #371

yarikoptic opened this issue Nov 13, 2019 · 10 comments
Labels
enhancement New feature or request help wanted Extra attention is needed opinions wanted Please read and offer your opinion on this matter

Comments

@yarikoptic
Copy link
Collaborator

This is a brief issue to possibly start discussion and so we do not forget.
Attn @tsalo @satra

ATM, to accomplish the job, many BEPs need to introduce changes to

Quite often those changes are of interest for multiple BEPs, (e.g. _part for BEP001 (multiple contrasts) and BEP004 (SWI)), or even could be argued to be applicable to existing "raw" BIDS itself (e.g. bids dataset provenance, #369).

But being introduced in dedicated specialized BEPs they

  • lack proper review by wider BIDS community, and thus have more potential to introduce conceptual inconsistencies within BIDS
  • make BEPs development/review/acceptance stagnate
  • make more difficult for support tools (validator, pybids) to review/adopt them

I would propose or us to

  • improve contributing guide, in particular for BEPs, to RECOMMEND (if not REQUIRE) introducing any such changes as separate PRs against BIDS, with references to existing (or proposed) BEPs
  • work between/with existing BEPs to identify such atomic changes and propose them as separate PRs
  • for PRs add labels for each BEP (BEP001, ...) so we could see which PRs would be of use for which PR
@yarikoptic yarikoptic added enhancement New feature or request help wanted Extra attention is needed opinions wanted Please read and offer your opinion on this matter labels Nov 13, 2019
@yarikoptic
Copy link
Collaborator Author

@bids-standard/bep_leads @bids-standard/derivatives - I would value your input on this proposal

@robertoostenveld
Copy link
Collaborator

I would welcome this. The BEP guideline seems not to have been updated for quite some time, is not that easy to find, and might not be that well known to current BEP contributors.

A change in style of working that I realize we went through while extending MRI->MEG->EEG+iEEG, and time-wise somewhere between MEG and EEG+iEEG switching from google docs to github for managing the specification, is that the extension process now expects diffs/PRs, whereas it initially it was more like writing a (partially copied) specification that would be merged manually. Quite some older BEPs are still in the old style (full spec), whereas some of the newer BEPs seem to skip google docs (and thereby risk loosing part of the audience/stakeholders) and only do diffs/PRs.

Wrt the github process: each issue/PR is treated by github (the website) on its own. That causes confusion between the dependencies of issues/PRs (e.g. there is still a mix-up between common-derivatives and imaging-derivatives). Bugzilla and Jira have explicit mechanisms for this. Adding "BEPxxx" to the name and colored tags is a step in the right direction, but still relatively poor (although possibly the best that we can do within github limits).

We might also consider github projects to structure the relation between, and work on issues/PRs. E.g. one project per BEP.

And/or we might consider to more explicitly use branches under the main bids-specification repo, e.g. by having BEP-draft branches (which could also be rendered on RTD). PRs that are specific to a BEP could initially be merged in the BEP specific branch.

@emdupre
Copy link
Collaborator

emdupre commented Jan 13, 2020

work between/with existing BEPs to identify such atomic changes and propose them as separate PRs

I'm a strong +1 on this.

An immediate example is one @jasmainak and I discovered a while back -- the concerns in BEP21 on re-annotating electrophysiological data strongly relate to concerns with re-annotating naturalistic stimuli for features of interest (such as used in @adelavega 's NeuroScout ) and I think corresponding to BEP02.

I completely agree with @yarikoptic that there's a strong danger of individual BEP leaders not realizing these connections (simply because it would require keeping all of the other BEPs in mind !) and therefore adopting different approaches to the same problem. Personally, I think this would be an ideal role for the steering committee, though I wonder what @robertoostenveld thinks.

@emdupre
Copy link
Collaborator

emdupre commented Jan 14, 2020

Pinging @bids-standard/steering for opinions on this point:

agree with @yarikoptic that there's a strong danger of individual BEP leaders not realizing these connections (simply because it would require keeping all of the other BEPs in mind !) and therefore adopting different approaches to the same problem. Personally, I think this would be an ideal role for the steering committee

and any thoughts on how to actualize it 😸

@yarikoptic
Copy link
Collaborator Author

and any thoughts on how to actualize it

I would have continued along "I would propose or us to" lines in the original description. May be make them a bit more specific with rules such as

  • any new _key-value pair (entity) should be submitted as a separate PR open for discussion (we could add a specific template for such PRs)

And then act on those few cases I have already identified (e.g., _part) and submit them as PRs, referencing corresponding BEPs.

@emdupre
Copy link
Collaborator

emdupre commented Jan 14, 2020

Sorry @yarikoptic, I meant to ask specifically about how BIDS Steering could help with identifying cross-BEP points of convergence going forward. But yes, I think your suggestions make sense !

@HenkMutsaerts
Copy link
Collaborator

HenkMutsaerts commented Jan 14, 2020 via email

@robertoostenveld
Copy link
Collaborator

I think this would be an ideal role for the steering committee, though I wonder what @robertoostenveld thinks.

I don't think the steering group has a better overview per see than other community members, at least I do not have that overview and don't know BEP details (except for the ones I am involved in on a personal title). The "content" is provided by the community members with domain specific knowledge, and the combined wisdom/insight/expertise of the community is needed in the consensus process. The steering group can at best only guide the process.

I do think that we have to improve the signal-to-noise ratio in the communication. That would make it easier for all stakeholders to see where a specific BEP links to other work. My previous comment wrt the github process (projects, branches) might actually not help for that, it would just add another layer and hence more complexity that stands in the way of recognizing what is going on.

I think we (=the community) should update the BEP guideline and that the steering group might help in BEP leads following it. E.g., following up on the community forum meeting yesterday, I asked - on behalf of the steering group - all BEP leads to update their main document to make clear where their BEP specific discussion happens, and some have already done so. Such a process-level review (in this case: ensure that communication is on an open channel that others can join) is something the steering group might be able to take up.

I realize that by looking at the different BEPs, we can learn a lot from each other and improve good practices. The BEP process is not only on merging PRs in the spec (where I agree that overlap should be identified as potential conflicts between modalities, but also as potential added-value between modalities). The BEP process is also about making (and discussing) examples, and ensuring the validator remains consistent with the spec. The suggestions at the start of this issue by @yarikoptic are very good, and we should also highlight other existing practices that we value, e.g., summarizing proposed changes to the spec explicitly, as in BEP001, or explicitly considering the overlap in entities, as @sappelhoff introduced while working on MEG, EEG and iEEG extensions.

@poldrack
Copy link

poldrack commented Jan 15, 2020 via email

@dorahermes
Copy link
Member

In iEEG we felt that is was the responsibility of the BEP leads to be up to date on the most closely related extensions and ensure compatibility with these and with the main spec. Some examples:

  • We had to communicate very closely with the EEG extension and be up to date on the derivatives proposal to make sure that we were aligned and did not add potential derivatives in the iEEG extension. This required many rounds of updates on the spec, examples and consistency in the validator. We were very fortunate to have a close communication with the EEG and derivatives leads and they were involved in the iEEG extension to make this happen and @sappelhoff closely monitored both extensions.
  • We used the google doc to tag other leads when we needed input/consensus for the iEEG spec and decide what to include/exclude before we moved to Github.
  • We reviewed the EEG and derivatives spec for compatibility several times.
  • We used github issues when it came to the merging stage.
  • We used the mailing list for issues that could potentially have larger impact or to request input from the community at large (e.g. on topics of units, spaces etc).
  • The amazing BEP steering person at that time guided us through this process and helped put us in touch with the relevant people.

Let me know if you'd like a user story on this.

yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Oct 11, 2021
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance
capture/expression for BIDS datasets.  This PR just follows the idea of bids-standard#371
(small atomic ENHs), and is based on current state of the specification where
we have GeneratedBy to describe how a BIDS derivative dataset came to its
existence.

## Rationale

As I had  previously stated in many (face-to-face when it was still
possible ;)) conversations, in my view, any BIDS dataset is a derivative
dataset.  Even if it contains "raw" data, it is never given by gods, but is a
result of some process (let's call it pipeline for consistency) which produced
it out of some other data. That is why there is 1) `sourcedata/` to provide
placement for such original (as "raw" in terms of processing, but "raw"er in
terms of its relation to actual data acquired by equipment), and 2) `code/` to
provide placement for scripts used to produce or "tune" the dataset.  Typically
"sourcedata" is either a collection of DICOMs or a collection of data in some
other formats (e.g. nifti) which is then either converted or just renamed into
BIDS layout. When encountering a new BIDS dataset ATM it requires forensics
and/or data archaeology to discover how this BIDS dataset came about, to e.g.
possibly figure out the source of the buggy (meta)data it contains.

At the level of individual files, some tools already add ad-hoc fields
during conversion into side car .json files they produce,

<details>
<summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary>

```shell
(git-annex)lena:~/datalad/dbic/QA[master]git
$> git grep ConversionSoftware | head -n 2
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftware": "dcm2niix",
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0",

```
</details>

ATM I need to add such metadata to datasets produced by heudiconv to make
sure that in case of incremental conversions there is no switch in versions of
the software.
yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Oct 11, 2021
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance
capture/expression for BIDS datasets.  This PR just follows the idea of bids-standard#371
(small atomic ENHs), and is based on current state of the specification where
we have GeneratedBy to describe how a BIDS derivative dataset came to its
existence.

## Rationale

As I had  previously stated in many (face-to-face when it was still
possible ;)) conversations, in my view, any BIDS dataset is a derivative
dataset.  Even if it contains "raw" data, it is never given by gods, but is a
result of some process (let's call it pipeline for consistency) which produced
it out of some other data. That is why there is 1) `sourcedata/` to provide
placement for such original (as "raw" in terms of processing, but "raw"er in
terms of its relation to actual data acquired by equipment), and 2) `code/` to
provide placement for scripts used to produce or "tune" the dataset.  Typically
"sourcedata" is either a collection of DICOMs or a collection of data in some
other formats (e.g. nifti) which is then either converted or just renamed into
BIDS layout. When encountering a new BIDS dataset ATM it requires forensics
and/or data archaeology to discover how this BIDS dataset came about, to e.g.
possibly figure out the source of the buggy (meta)data it contains.

At the level of individual files, some tools already add ad-hoc fields
during conversion into side car .json files they produce,

<details>
<summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary>

```shell
(git-annex)lena:~/datalad/dbic/QA[master]git
$> git grep ConversionSoftware | head -n 2
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftware": "dcm2niix",
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0",

```
</details>

ATM I need to add such metadata to datasets produced by heudiconv to make
sure that in case of incremental conversions there is no switch in versions of
the software.
yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Oct 11, 2021
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance
capture/expression for BIDS datasets.  This PR just follows the idea of bids-standard#371
(small atomic ENHs), and is based on current state of the specification where
we have GeneratedBy to describe how a BIDS derivative dataset came to its
existence.

## Rationale

As I had  previously stated in many (face-to-face when it was still
possible ;)) conversations, in my view, any BIDS dataset is a derivative
dataset.  Even if it contains "raw" data, it is never given by gods, but is a
result of some process (let's call it pipeline for consistency) which produced
it out of some other data. That is why there is 1) `sourcedata/` to provide
placement for such original (as "raw" in terms of processing, but "raw"er in
terms of its relation to actual data acquired by equipment), and 2) `code/` to
provide placement for scripts used to produce or "tune" the dataset.  Typically
"sourcedata" is either a collection of DICOMs or a collection of data in some
other formats (e.g. nifti) which is then either converted or just renamed into
BIDS layout. When encountering a new BIDS dataset ATM it requires forensics
and/or data archaeology to discover how this BIDS dataset came about, to e.g.
possibly figure out the source of the buggy (meta)data it contains.

At the level of individual files, some tools already add ad-hoc fields
during conversion into side car .json files they produce,

<details>
<summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary>

```shell
(git-annex)lena:~/datalad/dbic/QA[master]git
$> git grep ConversionSoftware | head -n 2
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftware": "dcm2niix",
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0",

```
</details>

ATM I need to add such metadata to datasets produced by heudiconv to make
sure that in case of incremental conversions there is no switch in versions of
the software.
yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Oct 11, 2021
bids-standard#487 (and originally bids-standard#439) is a `WIP ENH` to introduce standardized provenance
capture/expression for BIDS datasets.  This PR just follows the idea of bids-standard#371
(small atomic ENHs), and is based on current state of the specification where
we have GeneratedBy to describe how a BIDS derivative dataset came to its
existence.

## Rationale

As I had  previously stated in many (face-to-face when it was still
possible ;)) conversations, in my view, any BIDS dataset is a derivative
dataset.  Even if it contains "raw" data, it is never given by gods, but is a
result of some process (let's call it pipeline for consistency) which produced
it out of some other data. That is why there is 1) `sourcedata/` to provide
placement for such original (as "raw" in terms of processing, but "raw"er in
terms of its relation to actual data acquired by equipment), and 2) `code/` to
provide placement for scripts used to produce or "tune" the dataset.  Typically
"sourcedata" is either a collection of DICOMs or a collection of data in some
other formats (e.g. nifti) which is then either converted or just renamed into
BIDS layout. When encountering a new BIDS dataset ATM it requires forensics
and/or data archaeology to discover how this BIDS dataset came about, to e.g.
possibly figure out the source of the buggy (meta)data it contains.

At the level of individual files, some tools already add ad-hoc fields
during conversion into side car .json files they produce,

<details>
<summary>e.g. dcm2niix adds ConversionSoftware and ConversionSoftwareVersion</summary>

```shell
(git-annex)lena:~/datalad/dbic/QA[master]git
$> git grep ConversionSoftware | head -n 2
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftware": "dcm2niix",
sub-amit/ses-20180508/anat/sub-amit_ses-20180508_acq-MPRAGE_T1w.json:  "ConversionSoftwareVersion": "v1.0.20170923 (OpenJPEG build) GCC6.3.0",

```
</details>

ATM I need to add such metadata to datasets produced by heudiconv to make
sure that in case of incremental conversions there is no switch in versions of
the software.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

No branches or pull requests

6 participants