Extend schema to cover metabolomics output to enable search by compound #170

cmungall · 2020-12-05T01:19:02Z

C-MS based metabolomics workflow output examples from @corilo here:

https://drive.google.com/drive/u/1/folders/1_dHFvIK9PwJCKVJznwqWgqvfOIhTCkvx

Expected changes:

Metadata JSON file: structure will change after the adoption of the labels defined on AIM 1,
DataTable CSV file: metabolite names will change (removing comments and extra fields), and more columns will be added to include CAS and KEGG compound ID

NOTE: for now, make comments on @dehays google doc https://docs.google.com/document/d/15fga30d619WRxAUk8LyrojwIN1m89_K-sIrmUg4Y3tY/edit rather that commenting in ticket. We will still use this ticket to track status

As a first pass we will not try and capture everything from a metabolomics workflow, just the aspects that are necessary for search

dehays · 2020-12-10T19:31:18Z

Yuri @corilo - will produce JSON document for a single execution of his workflow. Then get feedback (Chris, Bill, David) and possibly iterate before continuing to do remaining metabolomics workflow instances - structure will then be used for lipidomics and OM workflow execution metadata

corilo · 2020-12-15T04:13:03Z

@dehays, @cmungall @wdduncan
Here is the first draft of the JSON document. Please provide feedback on the structure, labels, and any missing data as required.

I forgot what the "was_informed_by" and what it should contain. Please help?

https://drive.google.com/file/d/1BnP8q-iDQP2vmswN9v68WDQfNG4uDek5/view?usp=sharing

Thanks

dehays · 2020-12-16T08:35:21Z

"has_input": [
            "emsl:sha256:37417faf2c1b07ef9c59868683e41577bb3a745128bdc88b6cc59e579b5b30d0"
        ]

This ID (the sha256 hash) doesn't match the ID of any instrument output EMSL has provided to NMDC. This uses that dataset ID (that some parts of EMSL use and some parts do not) with a prefix of "output_". Looks like "emsl:output_500097". We can revisit how EMSL sets unique IDs, but the IDs need to be consistent so that there is a path back from analysis to sample and study.

"was_informed_by" is a relationship on an analysis execution activity (i.e. instance of running the metabolomics analysis workflow) that refers to the instrument run (OmicsProcessing) entity. Again, the ID currently looks like "emsl:500097" and uses the dataset ID of the instrument run.

The Metabolites object - I was expecting an array rather than an object, but a more important conversation is in how this will relate to the structure @cmungall is describing in #176 I hope to understand this better after speaking with Chris on Wednesday

cmungall added a commit that referenced this issue Dec 17, 2020

link metabolomics workflows to metabolites, #170

9cd81df

cmungall assigned cmungall, corilo and wdduncan Dec 17, 2020

cmungall changed the title ~~Extend schema to cover metabolomics output~~ Extend schema to cover metabolomics output to enable search by compound Dec 17, 2020

cmungall mentioned this issue Dec 17, 2020

Export metabolomics JSON conformant with NMDC schema #180

Open

wdduncan closed this as completed Mar 30, 2021

ssarrafan unassigned cmungall, wdduncan and corilo Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend schema to cover metabolomics output to enable search by compound #170

Extend schema to cover metabolomics output to enable search by compound #170

cmungall commented Dec 5, 2020 •

edited

Loading

dehays commented Dec 10, 2020

corilo commented Dec 15, 2020

dehays commented Dec 16, 2020

Extend schema to cover metabolomics output to enable search by compound #170

Extend schema to cover metabolomics output to enable search by compound #170

Comments

cmungall commented Dec 5, 2020 • edited Loading

dehays commented Dec 10, 2020

corilo commented Dec 15, 2020

dehays commented Dec 16, 2020

cmungall commented Dec 5, 2020 •

edited

Loading