Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Required/Recommended Metadata info to Schema #699

Open
dlevitas opened this issue Dec 31, 2020 · 5 comments
Open

Add Required/Recommended Metadata info to Schema #699

dlevitas opened this issue Dec 31, 2020 · 5 comments
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.

Comments

@dlevitas
Copy link
Contributor

This is in regards to the BIDS schema, where yaml files specify the required/recommended suffixes, entities, and extensions for BIDS file names. However, the BIDS spec specifies select fields in the JSON metadata that are either required, recommended, or optional. For example, functional MRI acquisitions must have the RepetitionTime field in the corresponding JSON file. Is this something that can be added to the schema? I'd be happy to open a PR if this seems worthwhile.

@tsalo
Copy link
Member

tsalo commented Dec 31, 2020

This is definitely a long-term goal for the schema. I started working with the NIDM-Terms folks on this (see #423), although I've been pulled into other things recently and I haven't made much progress on it (see #609, which is probably woefully out of date at this point). The relevant issue is probably #604.

Two elements that we'll want to have working before we move the metadata into the schema are:

  1. Supporting logic within the schema (Mutual exclusion and conditional relationships in the schema #620). There are relationships between metadata fields that we'll want to represent in the schema. My favorite example of this is the timing info for task fMRI. You can have RepetitionTime, but not AcquisitionDuration or VolumeTiming, or you can have SliceTiming and VolumeTiming, but not RepetitionTime or DelayTime, etc. There are like five possible combinations of five different metadata fields.
  2. Rendering schema elements in the specification automatically ([SCHEMA] Render schema elements in text #610). If we don't have the schema represented in the specification directly, we're just asking for drift between the two information sources. We've already noticed the difficulty in keeping what the schema up-to-date w.r.t. the specification, so I'd hate to add all of the metadata fields to the schema and then have it sit, growing more and more out-of-date, as has happened with [SCHEMA] Reorganize schema #609.

I'd be happy to have any help you're willing to provide!

@tsalo tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Dec 31, 2020
@dlevitas
Copy link
Contributor Author

dlevitas commented Dec 31, 2020

Sure, I'd be happy to help. Regarding your points:

1). That's a good example, one that I wasn't aware of. I suppose that would need to be fleshed out at some point.

2). My thought was to use a web scrapping library (e.g. Beautiful Soup) to select the schema elements; unsure though if that would address the issue. If so, grab the schema elements and place them into yaml files based on DataType (and ModalityLabel)

@tsalo
Copy link
Member

tsalo commented Jan 1, 2021

1. 👍

2. The NIDM-Terms folks have done a lot of work on automatically extracting terms from the specification already. Check out the bids-terms files in the nidm-terms repository. There are a few other places on GitHub with relevant scripts and files, but I can't remember them at the moment. There's still a fair amount of work to do (e.g., manual review, figuring out how to represent the terms in yaml format, what metadata we care about for each term, etc.), but that's a great place to start working from.

EDIT: Also, adding functions to the new schema rendering tools presented in #610 (after it's merged, of course) for building metadata tables would be very help as well.

@satra
Copy link
Collaborator

satra commented Mar 17, 2021

@dbkeator - pinging you here. perhaps someone has created all the metadata fields somewhere outside markdown and we just don't know it :) or we should at least take a union of all the json metadata from the openneuro datasets.

@tsalo
Copy link
Member

tsalo commented Mar 17, 2021

At least to start, I think it would be a good idea to do a direct translation of the json schemas from the validator to yaml format for the specification schema, combined with the descriptions from the specification. I've started drafting something to that effect in #762, if anyone has some time to look it over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
None yet
Development

No branches or pull requests

3 participants