Sample and Data Relationship Format for Proteomics

The SDRF-Proteomics file format describes the sample characteristics and the relationships between samples and data files. The file format is a tab-delimited one where each ROW corresponds to a relationship between a Sample and a Data file (and channel in the context of labelled samples), each column corresponds to an attribute/property of the Sample and the value in each cell is the specific value of the property for a given Sample.

SDRF for proteomics

The SDRF-Proteomics is divided into three main blocks:

characteristics[...]: These are properties of the sample of origin.
comment[...]: These are the data properties.
factor value[...]: These are the variables under study.

The SDRF columns MUST starts with the source name which is the sample accession. After the sample accession all the columns correspond to the sample characteristics, for example (characteristics[organism]), until the assay name column which starts the Data file section.

The Data properties section (comment) starts with the assay name which is the Data file accession. After the assay name the following properties (comment) are mandatory for SDRF-Proteomics:

comment[label]: The label is the channel used in multiplexed experiments (e.g, TMT126 - check the documentation for the labelled methods). If the sample is not label free or the experiments haven't used any multiplex analytical method, the value MUST BE label free sample.
comment[fraction identifier]: The fraction identifier is a unique identifier for each Data file. Fraction identifiers help to identify any type of Fractionation method including: High-performance liquid chromatography, Isoelectric focusing or Off-gel electrophoresis.

The values properties within the main three blocks chracteristics, comment and factor value, MUST be ontology terms that can be found in the OLS ontology service. The SDRF-Proteomics file format allows researchers and submitters to go from a simple file format like:

source name	characteristics[organism]	characteristics[organism part]	characteristics[biological replicate]	assay name	comment[technical replicate]	comment[fraction identifier]	comment[label]	factor value[organism part]
Sample-1	homo sapiens	heart	1	ms_run 1	1	1	label free sample	heart
Sample-2	homo sapiens	liver	1	ms_run 2	1	1	label free sample	liver

The previous example, only contains the minimum information a researcher needs to understand the Sample and Data file relationship. The factor value is used to define which characteristic from the sample is under study (e.g. organism part). The example can be read as: Two different label-free samples (one from the liver and one from the heart in human) with no fractionation are compared.

With the following properties source name, characteristics[biological replicate], assay name, comment[technical replicate], comment[label], comment[fraction identifier], factor value[sample property], the submitter can annotate the relation between the sample and the data file (and label channel in the context of multiplexed experiment). However, more metadata is needed in order to understand the Sample source, the data acquisition protocol etc.

SDRF-Proteomics Sample templates.

ProteomeXchange partners have defined the SDRF-Proteomics templates; a group of guidelines and checklists of minimum sample metadata requested for different types of experiment. For example, for Human datasets the following metadata MUST be provided:

source name: Sample identifier
characteristics[organism]: Organism
characteristics[ancestry category]: Ancestry category
characteristics[age]: Age of the individual, the age should be formatted as the specification recommends
characteristics[sex]: Sex
characteristics[disease]: Disease under study
characteristics[organism part]: Organism part
characteristics[cell type]: Cell type
characteristics[individual]: Unique identifier for the individual or patient
characteristics[biological replicate]: Biological replicate accession. These variable is related with the factor value.

Data files metadata

As mentioned before, for each Data file the following metadata is required:

assay name: Identifier of the file
comment[label]: The labelled used in the experiment. Some tools recognise it as channel.
comment[fraction identifier]: Fraction identifier accession. If no fractionation is used, all the values should be 1.
comment[technical replicate]: Technical replicate accession.
comment[data file]: Data file name, including the extension.

Additionally, the ProteomeXchange request for every dataset to provides the following properties:

comment[cleavage agent details]: Enzymes and cleavage agents use in the experiment
comment[instrument]: MS instrument used to acquire the data.

Templates are designed to capture the minimum metadata at sample and data level, depending on the type of the experiment. For example for cell line datasets, the characteristics[cell line] MUST be provided.

Annotating MS-based metadata

The ProteomeXchange consortium RECOMMEND providing additional information to each RAW file. These additional information include:

comment[modification parameters]: Post-translational modifications searched in the experiment.
comment[fragment mass tolerance]: Fragment tolerances
comment[precursor mass tolerance]: Precursor tolerance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample and Data Relationship Format for Proteomics

SDRF-Proteomics Sample templates.

Data files metadata

Annotating MS-based metadata

30 minutes introduction to the File format

Clone this wiki locally