Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabolomics example #680

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Metabolomics example #680

wants to merge 8 commits into from

Conversation

mmattano
Copy link
Collaborator

Annotation of example studies for metabolics SDRF

@ypriverol ypriverol added this to the metabolomics SDRF milestone Nov 3, 2023
@ypriverol ypriverol added sdrf-metabolomics question Further information is requested Specification Specification issues related with PRIDE formats, API, etc PSI-Discussion labels Nov 3, 2023
@ypriverol
Copy link
Member

@mmattano I have done the first review of the proposal, here are a couple of ideas:

  • mzXML is in principle a raw file, like mzML or the Raw files. Then, it should be possible to add them as raw files.
  • I see that you have the same sample, with 2 organism parts liver and plasma; unless you have a different raw file you should annotate the sample + data file relation into only one row of the file. Then you need to add two columns with two organism parts liver, plasma.
  • Im wondering if we need to annotate for SDRF files the comment[metabolite assignment file], this looks like another file relation that should be annotated in the SDRF at least not in this way. We have tried to keep the SDRF as the relation between the Sample and the Data (Raw) files, not results involved. My question is what is that field for? Are this the metabolites quantified or the database used to perform the identification/quant process. ?

@ypriverol ypriverol linked an issue Nov 3, 2023 that may be closed by this pull request
7 tasks
@mmattano
Copy link
Collaborator Author

@ypriverol I made some changes to the file to get closer to complying with the testing (I'll edit it more with the next upload) and to incorporate your comments. About them:

  • I moved the mzXML to the raw file column. You're right to suggest putting it there, I originally reserved it for the instrument output files but that would be overkill
  • There are different raw files for the each plasma and each liver sample. Is that ok and is that what you were asking? Let me know if I missed something
  • I do agree that it's something that is not needed to purely connect the sample to the data, but it is something that is needed for further processing. So I'd say that it's ok to kick it out for a "pure" sample-data annotation, but that it would be good to have a link to that. what do you think?

@@ -0,0 +1,136 @@
source name characteristics[organism] characteristics[organism part] characteristics[sex] characteristics[biological replicate] characteristics[disease] characteristics[cell type] assay name technology type comment[technical replicate] comment[chromatography instrument] comment[chromatography type] comment[scan polarity] comment[instrument] comment[ion source] comment[data file] comment[metabolite assignment file] comment[fraction identifier] factor value[gene]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmattano I think fraction identifier is extremely strange in metabolomics, meaning fractionation is less frequently than in proteomics. For that reason, I recommend removing that column from the metabolomics SDRF.

@mmattano Can you explain why do you think the following columns are needed for the data, specially targeted metebolomics: chromatography instrument chromatography type scan polarity instrument ion source metabolite assignment file

For example, in the proteomics specification we required to provide the instrument but even if MS-based also use LC technologies we don't require to be included in the file because we thought it was not "mandatory" to interpret the data. Is it the case in metabolomics we need the chromatography instrument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I agree, as discussed in the issue, fractionation is rare, so I'll remove it, I just included potentially interesting columns here to continue the discussion. And you're also right on the chromatography instrument, that shouldn't be necessary. chromatography type is very important in my opinion since this directly impacts downstream processing, same as the scan polarity. Specifying the instrument is debatable but at least the instrument type should be specified. Similar for the ion source, although this could be omitted. I would keep the metabolite assignment file since most types of analysis will make use of one and having this information is critical to recreate the results

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmattano @nilshoffmann (also can give input here): For a specification of SDRF metabolomics, we first need to define the requirement columns (which I, sometime, called the minimum information needed to represent the experiment).

In this, PR can try to define what will be the minimum info for the metabolomics dataset. BTW, I do think the sample metadata (characteristics it is the same between both metabolomics and proteomics), the main work should be done in the data site comments.

Added more ontologies
Untargeted metabolomics study
Targeted metabolomics example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PSI-Discussion question Further information is requested sdrf-metabolomics Specification Specification issues related with PRIDE formats, API, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extending SDRF for Metabolomics
2 participants