Skip to content

Commit

Permalink
Merge pull request #556 from geneontology/suzialeksander-patch-121
Browse files Browse the repository at this point in the history
Update gene-product-information-gpi-format-20.md
  • Loading branch information
suzialeksander authored Jul 12, 2024
2 parents a025f9c + 6c67a1b commit bc2aab3
Showing 1 changed file with 19 additions and 14 deletions.
33 changes: 19 additions & 14 deletions _docs/gene-product-information-gpi-format-20.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,47 @@ title: Gene Product Information (GPI) format 2.0
permalink: /docs/gene-product-information-gpi-format-2.0/

---
# This page describes the Gene Product Information (GPI) 2.0 format. This format has not yet been implemented in GO but is provided to help with the changeover from previous GPAD/GPI versions.
## Currently under construction
# About GPAD/GPI files

The Gene Ontology Consortium stores annotation data, the representation of gene product attributes using GO terms, in tab-delimited text files. *G*ene *P*roduct *A*ssociation *D*ata (GPAD) and (*G*ene *P*roduct *I*nformation) (GPI) companion files reduce the redundancy of the [Gene Association File (GAF)](/docs/go-annotation-file-gaf-format-2.2/). GAF files contains information about gene products that are present in each line of the GAF: each non-header line in an annotation file represents a single association between a gene product and a GO term with a certain evidence code and the reference to support the link. The GPAD/GPI file system normalizes the data by separating the annotations and metadata about gene and gene product entities in two separate files. GPAD/GPI is intended for internal GO use.

# Gene Product Information (GPI) files
GO also provides annotations as [GAF files](/docs/go-annotation-file-gaf-format-2.2/) and recommends use of the GAF format for most use cases. For more general information on annotation, please see the [Introduction to GO annotation](/docs/go-annotations/).

The Gene Ontology Consortium stores annotation data, the representation of gene product attributes using GO terms, in tab-delimited text files. Each non-header line in an annotation file represents a single association between a gene product and a GO term with a certain evidence code and the reference to support the link.

This guide lays out the format specifications for the *G*ene *P*roduct *I*nformation (GPI) 2.0 format.
**Note that the GPI file is the companion file for the [GPAD file](/docs/gene-product-association-data-gpad-format/).
Both files should be submitted together using the same version.**
# Gene Product Information (GPI) 2.0 file guidelines

GPAD/GPI is intended for internal GO use. GO also provides annotations as [GAF files](/docs/go-annotation-file-gaf-format-2.2/) and reccommends use of the GAF format for most use cases.
This page is a summary of the Gene Product Information Data (GPI) 2.0 format; for full technical details and changes from GPI 1.2 [see the GitHub specification page](https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-2-0.md). The companion file to this GPI 2.0 is [GPAD 2.0](/docs/gene-product-association-data-gpad-format-2.0/).

For more general information on annotation, please see the [Introduction to GO annotation](/docs/go-annotations/).
**Note that the GPI file is the companion file for the [GPAD file](/docs/gene-product-association-data-gpad-format/).
Both files should be submitted together using the same version.**

# Changes from the GPI 1.2 to GPI 2.0
* **Characters allowed in all fields have been explicitly specified**
* **Extensions in file names are: `*.gpad` and `*.gpi`**

**Header**
* **The `gpi-version` header must read `2.0` for this format.**
* **The `gpi-version:` header must read `2.0` for this format.**

**Columns**
* **Columns 1 & 2 from the GPI 1.2 are now combined in a single column containing an ID in CURIE syntax, e.g. `UniProtKB:P56704`.**
* **Columns 1 & 2 in the GPI 1.2 are now combined in a single column containing an ID in CURIE syntax, e.g. `UniProtKB:P56704`.**
* **NCBI taxon IDs are to be prefixed with `NCBITaxon:` to indicate the source of the ID, e.g. `NCBITaxon:6239`**
<!-- does col 5 have to be an ontology ID or are ontology labels, entity types ok? -->

# Gene Product Information (GPI) 2.0 format

## GPI Header
### Required information to provide in the header:
### Required information to provide in the header
All annotation files must start with a single line denoting the file format. The database/group generating the file as listed in dbxrefs.yaml and the ISO-8601 formatted date the file was generated must be included in the header. Example for GPI 2.0:

!gpi-version: 2.0
!generated-by: SGD
!date-generated: 2024-05-01

Other information, such as contact details for the submitter or database group, database URLs, etc. can be included in an association file header by prefixing the line with an exclamation mark (`!`); such lines will be ignored by parsers.
The group in the `generated-by` field must be present in the [dbxrefs.yaml file](https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml). The year must be `YYYY-MM-DD`, conforming to the date portion of [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) standards. Submitting groups may choose to include optional additional information in a file header by prefixing the line with an exclamation mark (`!`); such lines will be ignored by parsers. For example:

!URL: http://www.yeastgenome.org/
!Project-release: WS275
!Funding: NHGRI grant number HG012212
!go-version: https://doi.org/10.5281/zenodo.8436609

## GPI fields

Expand Down

0 comments on commit bc2aab3

Please sign in to comment.