Investigation description format for Protemics

Table of Contents

1. Status of this document
2. Abstract
- 2.1. IDF Tags
3. IDF proteomics Tags

1. Status of this document

This document provides information to the proteomics community about a proposed standard for sample metadata annotations in public repositories called Sample and Data Relationship File (SDRF)-Proteomics format. Distribution is unlimited.

Version Draft - this is a draft of version 1.0

2. Abstract

The MAGE-TAB is composed by two main files SDRF and IDF. The IDF (investigation description format) aims to capture the general description of the experiment and dataset. Similar to the exiting [ProteomeXchange.xml](additional-documentation/proteomeXchange-1.4.0.xsd) the IDF capture information about title of the experiment, a general description, or the submitter information.

The IDF describes the project and experiment general characteristics. :

The IDF should be compatible with the existing [ProteomeXchange.xml](additional-documentation/proteomeXchange-1.4.0.xsd).
The IDF is a Tab-delimited and key-value pair.

2.1. IDF Tags

The IDF component of a MAGE-TAB document consists of a set of unique tags attached to their corresponding values in a simple tab-delimited text format. For example, "Experiment Description" should be followed by a free-text description of the experiment. Most of the following fields can be used with more than one value, so that (for example - multiple experimental factors etc) multiple values can be defined in a single IDF file. In such cases the multiple values should be separated by semicolons (";")

3. IDF proteomics Tags

3.1. Investigation Title

The overall title of the investigation. This tag can only have one value.

ℹ️	Corresponding field in proteomeXchange.xml: Title

3.2. Experiment Description

A short paragraph describing the experiment as free-text. This tag can only have one value. The text should clearly explain what you did in your experiment - this will help the curation team to check and process your MAGE-TAB document.

ℹ️	Corresponding field in proteomeXchange.xml: Description

3.3. Date of Experiment

The date on which the experiment was performed. This tag can only have one value.

ℹ️	Some databases like PRIDE provides the Submission data which can be consider as the Date of the Experiment.

3.4. Public Release Date

The date on which the experimental data will be/was released. You can ask us to change this later. This tag can only have one value.

ℹ️	Corresponding field in proteomeXchange.xml: announceDate

3.5. Person details

The proteomics community captures Person details differently than the IDF MAGE-TAB specification. The Person information is captured in ProteomeXchange as a list of contacts where each contact is a list of CvTerms. The name of the CVterm is the name of the attribute, value of the CvTerm is the value of the attribute. For example, in <cvParam cvRef="MS" accession="MS:1000586" name="contact name" value="Christoph Krisp"/> the name of the CVTerm is the contact name, the value is the name of the person.

In the IDF, a Contact is a Person with different properties, for example:

In Proteomics the following information is capture: Name, email, affiliation and role of the contact.

3.5.1. Person Last Name

The last name of each person associated with the experiment.

ℹ️	NOTE: *Submitter and Lab Head names are witten in ProteomeXchange as a unique Name. The automatic conversion will try to split into Person First Name and Last Name for IDF.

3.5.2. Person First Name

The first name of each person associated with the experiment.

ℹ️	NOTE: *Submitter and Lab Head names are witten in ProteomeXchange as a unique Name. The automatic conversion will try to split into Person First Name and Last Name for IDF.

3.5.3. Person Email

The email address of each person associated with the experiment.

3.5.4. Person Affiliation

The organization affiliation for each person associated with the experiment. This tag is mandatory for sequencing submissions.

3.5.5. Person Roles

The role(s) performed by each person. Typically, these terms should come from the Experimental Factor Ontology. See for example the list of organization role terms. If more than one role is needed per person, the roles should be given as a semicolon (;) delimited list.

ℹ️	The roles defined by ProteomeXchange are two: dataset submitter; or lab head

3.5.6. Person Roles Term Source REF

The source of the Person Roles terms; his must reference one of the Term Source Names defined in the IDF file.

3.5.7. Person Roles Term Accession Number

The accession number for this term, taken from the indicated Term Source.

3.6. Publication Details

3.6.1. PubMed ID

The PubMed IDs of the publication(s) associated with this investigation (where available).

3.6.2. Publication DOI

A Digital Object Identifier (DOI) for each publication (where available).

3.6.3. Publication Author List

The list of authors associated with each publication.

3.6.4. Publication Title

The title of each publication.

3.6.5. Publication Status

A term describing the status of each publication (e.g. submitted, in preparation, published).

3.6.6. Publication Status Term Source REF

The source of the Publication Status terms; his must reference one of the Term Source Names defined in the IDF file.

3.6.7. Publication Status Term Accession Number

The accession number for this term, taken from the indicated Term Source.

3.7. Sample and Data Protocols

The sample and data protocols in transciptomics are captured with a low-level details, while in proteomics is a summary of multiple protocols within two categories: Sample and Data Protocols. For that reason, we recommend writing the sample and data protocols in the following standard

3.7.1. Sample Protocol

Protocol Name

The names of the protocols used within the MAGE-TAB document.

ℹ️	The sample protocol name for PX submissions will be: P-MTAB-Sample-PXID. The protocol name will be the combination of Sample and the Submission PX in ProteomeXchange

3.7.2. Protocol Type

The type of the protocol, taken from a controlled vocabulary. Typically, this term should come from the Experimental Factor Ontology . See for example the list of protocol terms.

ℹ️	The protocol type for PX submissions will be: sample collection protocol

3.7.3. Protocol Description

A free-text description of the protocol. This text is included in a single tab-delimited field.

ℹ️	The Protocol Description is the present Sample Description in ProteomeXchange.

3.7.4. Protocol Parameters

A semicolon-delimited list of parameter names.

3.7.5. Protocol Hardware

The protocol hardware is the instrument that was use to capture the sample.

ℹ️	If multiple instruments are used, they should be separated by (;)

3.7.6. Data Protocol

The Data protocol is a generic wy in proteomics to capture all the metadata about the data analysis steps.

Protocol Name

The names of the protocols used within the MAGE-TAB document.

ℹ️	The sample protocol name for PX submissions will be: P-MTAB-Data-PXID. The protocol name will be the combination of Sample and the Submission PX in ProteomeXchange

3.7.7. Protocol Type

The type of the protocol, taken from a controlled vocabulary. Typically, this term should come from the Experimental Factor Ontology . See for example the list of protocol terms.

ℹ️	The protocol type for PX submissions will be: data analysis protocol

3.7.8. Protocol Description

A free-text description of the protocol. This text is included in a single tab-delimited field.

ℹ️	The Protocol Description is the present Data Description in ProteomeXchange.

3.7.9. Protocol Parameters

A semicolon-delimited list of parameter names.

3.7.10. Protocol Software

The software used by the protocol.

3.7.11. Protocol Term Source REF

The source of the Protocol Type terms; this must reference one of the Term Source Names defined elsewhere in the IDF file.

3.7.12. Protocol Term Accession Number

The accession number for this term, taken from the indicated Term Source.

3.8. Experimental Factors

3.8.1. Experimental Factor Name

A user-defined name for each experimental factor studied by the experiment. These experimental factors represent the variables within the investigation (e.g. growth condition, genotype, organism part). The actual values of these variables will be listed in the SDRF file, in "factor value[<factor name>]" columns.

3.8.2. Experimental Factor Type

A term describing the type of each experimental factor. These terms will usually come from the Experimental Factor Ontology.

3.8.3. Experimental Factor Term Source REF

The source of the Experimental Factor Type terms; this must reference one of the Term Source Names defined in the IDF file.

3.8.4. Experimental Factor Term Accession Number

The accession number for this term, taken from the indicated Term Source.

3.9. SDRF File

The name(s) of the SDRF file(s) accompanying this IDF file.

3.10. Additional Properties

3.10.1. ProteomeXchange accession number

Main identifier of a ProteomeXchange dataset.

3.11. Examples

PXD000612 - https://github.com/ypriverol/proteomics-metadata-standard/blob/master/annotated-projects/PXD000612/PXD000612.idf.tsv

Files

README.adoc

Latest commit

History

README.adoc

File metadata and controls

Investigation description format for Protemics

1. Status of this document

2. Abstract

2.1. IDF Tags

3. IDF proteomics Tags

3.1. Investigation Title

3.2. Experiment Description

3.3. Date of Experiment

3.4. Public Release Date

3.5. Person details

3.5.1. Person Last Name

3.5.2. Person First Name

3.5.3. Person Email

3.5.4. Person Affiliation

3.5.5. Person Roles

3.5.6. Person Roles Term Source REF

3.5.7. Person Roles Term Accession Number

3.6. Publication Details

3.6.1. PubMed ID

3.6.2. Publication DOI

3.6.3. Publication Author List

3.6.4. Publication Title

3.6.5. Publication Status

3.6.6. Publication Status Term Source REF

3.6.7. Publication Status Term Accession Number

3.7. Sample and Data Protocols

3.7.1. Sample Protocol

Protocol Name

3.7.2. Protocol Type

3.7.3. Protocol Description

3.7.4. Protocol Parameters

3.7.5. Protocol Hardware

3.7.6. Data Protocol

Protocol Name

3.7.7. Protocol Type

3.7.8. Protocol Description

3.7.9. Protocol Parameters

3.7.10. Protocol Software

3.7.11. Protocol Term Source REF

3.7.12. Protocol Term Accession Number

3.8. Experimental Factors

3.8.1. Experimental Factor Name

3.8.2. Experimental Factor Type

3.8.3. Experimental Factor Term Source REF

3.8.4. Experimental Factor Term Accession Number

3.9. SDRF File

3.10. Additional Properties

3.10.1. ProteomeXchange accession number

3.11. Examples