Which ID-like column from the Sample Metadata template's ProjectInformation tab? #384

turbomam · 2021-07-26T13:25:13Z

This is one component of issue #375

The Sample Metadata Template has several columns that look like IDs to me, but I assume that only one or two will be filled for most rows

How do we decide which should fill the NMDC shema id slot?

EMSL Proposal/Study Number
GOLD Study ID
JGI Proposal ID
Umbrella Bio Project ID

See also
https://microbiomedata.github.io/nmdc-schema/Study.html#class-study
https://microbiomedata.github.io/nmdc-schema/id.html

Input from @wdduncan @dwinston or other welcome too!

The text was updated successfully, but these errors were encountered:

cmungall · 2021-07-28T19:30:18Z

I suggest using the gold as primary

The other should go into alternate_identifiers, and we also have specific fields for different databases:

#384

We don't have a field for emsl or jgi yet, but can add these.

cmungall · 2021-07-28T19:45:17Z

Looking at spreadsheet

Remember all identifiers used in NMDC must conform to

https://microbiomedata.github.io/nmdc-schema/identifiers

Umbrella Bio Project Name	NCBI Accession: PRJNA594403

This isn't a name. This is the INSDC bioproject identifier. The correct prefix for this is bioproject

E.g.
https://identifiers.org/bioproject:PRJNA594403

Umbrella Bio Project ID	NCBI ID: 594403

I don't think we should include this

JGI Proposal ID	JGI:1781

I don't believe this is registered in any prefix registry. If we want to include this, we should registed. I suspect there needs additional disambiguation in the ID, either in the prefix (e.g. jgi.proposal:1781) or the local part (e.g. JGI:proposal1781)

dehays · 2021-08-04T18:57:00Z

Further comment on conforming identifiers:

'jgi' and 'jgi.proposal' are NOT registered CURIE prefixes. I would question whether NMDC needs to have any knowledge of JGI proposals. The supported identifiers would be GOLD study identifiers. (Which although not guaranteed to be 1:1 with JGI proposals, are usually 1:1. I believe GOLD studies can potentially span multiple JGI proposals - as proposals represent a funded unit of work.)

Similarly 'emsl' and 'uuid' are not registered CURIE prefixes so including those prefixes currently doesn't add any value. It would be good if 'emsl' was registered and that emsl prefixed identifiers were resolvable. 'uuid' describes an algorithm and not an identifier domain so it would not be registered. That doesn't mean that EMSL couldn't use one of the UUID algorithms to generate the local portion of the ID that is used within a valid FAIR identifier; i.e. emsl: where https://identifiers.org/emsl: resolved to the desired record.

@cmungall recommends using the GOLD identifier as the primary identifier for NMDC sample, study and instrument process IDs.

However - these will not always exist. We already have samples that do not exist in GOLD. These currently have emsl: or igsn: identifiers as their primary ID. And of course, for new samples, there will be no GOLD, EMSL or IGSN identifer at all. So new NMDC identifiers would need to be created.

Also, in considering of the proposal that NMDC sample identifiers be embedded in analysis identifiers, all samples in NMDC would need NMDC identifiers. These would be the primary ID within NMDC.

turbomam assigned turbomam and mslarae13 Jul 26, 2021

turbomam changed the title ~~Which ID-like column from the template's ProjectInformation tab?~~ Which ID-like column from the Sample Metadata template's ProjectInformation tab? Jul 26, 2021

This was referenced Jul 26, 2021

Sample Metadata template's ProjectInformation tab missing title #385

Open

S4G2 - Metadata ingest, extract metadata from spreadsheets #375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which ID-like column from the Sample Metadata template's ProjectInformation tab? #384

Which ID-like column from the Sample Metadata template's ProjectInformation tab? #384

turbomam commented Jul 26, 2021 •

edited

Loading

cmungall commented Jul 28, 2021

cmungall commented Jul 28, 2021

dehays commented Aug 4, 2021

Which ID-like column from the Sample Metadata template's ProjectInformation tab? #384

Which ID-like column from the Sample Metadata template's ProjectInformation tab? #384

Comments

turbomam commented Jul 26, 2021 • edited Loading

cmungall commented Jul 28, 2021

cmungall commented Jul 28, 2021

dehays commented Aug 4, 2021

turbomam commented Jul 26, 2021 •

edited

Loading