-
Notifications
You must be signed in to change notification settings - Fork 0
Which ID-like column from the Sample Metadata template's ProjectInformation tab? #384
Comments
I suggest using the gold as primary The other should go into alternate_identifiers, and we also have specific fields for different databases: We don't have a field for emsl or jgi yet, but can add these. |
Looking at spreadsheet Remember all identifiers used in NMDC must conform to https://microbiomedata.github.io/nmdc-schema/identifiers
This isn't a name. This is the INSDC bioproject identifier. The correct prefix for this is E.g.
I don't think we should include this
I don't believe this is registered in any prefix registry. If we want to include this, we should registed. I suspect there needs additional disambiguation in the ID, either in the prefix (e.g. jgi.proposal:1781) or the local part (e.g. JGI:proposal1781) |
Further comment on conforming identifiers: 'jgi' and 'jgi.proposal' are NOT registered CURIE prefixes. I would question whether NMDC needs to have any knowledge of JGI proposals. The supported identifiers would be GOLD study identifiers. (Which although not guaranteed to be 1:1 with JGI proposals, are usually 1:1. I believe GOLD studies can potentially span multiple JGI proposals - as proposals represent a funded unit of work.) Similarly 'emsl' and 'uuid' are not registered CURIE prefixes so including those prefixes currently doesn't add any value. It would be good if 'emsl' was registered and that emsl prefixed identifiers were resolvable. 'uuid' describes an algorithm and not an identifier domain so it would not be registered. That doesn't mean that EMSL couldn't use one of the UUID algorithms to generate the local portion of the ID that is used within a valid FAIR identifier; i.e. emsl: where https://identifiers.org/emsl: resolved to the desired record. @cmungall recommends using the GOLD identifier as the primary identifier for NMDC sample, study and instrument process IDs. However - these will not always exist. We already have samples that do not exist in GOLD. These currently have emsl: or igsn: identifiers as their primary ID. And of course, for new samples, there will be no GOLD, EMSL or IGSN identifer at all. So new NMDC identifiers would need to be created. Also, in considering of the proposal that NMDC sample identifiers be embedded in analysis identifiers, all samples in NMDC would need NMDC identifiers. These would be the primary ID within NMDC. |
This is one component of issue #375
The Sample Metadata Template has several columns that look like IDs to me, but I assume that only one or two will be filled for most rows
How do we decide which should fill the NMDC shema
id
slot?See also
https://microbiomedata.github.io/nmdc-schema/Study.html#class-study
https://microbiomedata.github.io/nmdc-schema/id.html
Input from @wdduncan @dwinston or other welcome too!
The text was updated successfully, but these errors were encountered: