Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new GO_REF metadata and update system #1764

Closed
4 of 8 tasks
kltm opened this issue Jan 19, 2022 · 6 comments · Fixed by #1785 or #1814
Closed
4 of 8 tasks

Create a new GO_REF metadata and update system #1764

kltm opened this issue Jan 19, 2022 · 6 comments · Fixed by #1785 or #1814
Assignees

Comments

@kltm
Copy link
Member

kltm commented Jan 19, 2022

Create a new GO_REF metadata and update system that:

  • contains all of the GO_REF data in a single machine-usable metadata file (location TBD)
  • Compiles on action to a static page in geneontology.github.io for delivery to geneontology.org

Steps could be:

Tagging @cmungall @ValWood @kimrutherford

@sujaypatil96
Copy link
Contributor

sujaypatil96 commented Jan 26, 2022

I was thinking about the data format for storing the GO_REF metadata, and I think it makes sense to use JSON, given that it's a serialization format? The current GO_REFs, for ex., goref-0000002, are markdown files that use a combination of YAML syntax as well as markdown syntax.

Reasoning: A good use case for using YAML is as a specification format, for ex., for specifying configuration files. But here, we need a format that can store the GO_REF metadata which will eventually be used by the geneontology.org site.

If we were to adopt JSON as the format of choice for this metadata system, the natural course of action would be to write a parser that parses all the current GO_REF markdown files, into JSON.

Considering goref-0000002 as an example, we can parse out the YAML block and markdown sections into JSON in the following way:

{
  # attributes from YAML block
  "alt_id": [
    "GO_REF:0000007",
    "GO_REF:0000014",
    "GO_REF:0000016",
    "GO_REF:0000017"
  ],
  "authors": "DDB, FB, MGI, GOA, ZFIN curators",
  "external_accession": [
    "MGI:2152098",
    "J:72247",
    "ZFIN:ZDB-PUB-020724-1",
    "FB:FBrf0174215",
    "dictyBase_REF:10157",
    "SGD_REF:S000124036"
  ],
  "id": "GO_REF:0000002",
  "year": 2001,
  "layout": "goref",
  # main goref title with `##` markdown header syntax
  "goref_title": "Gene Ontology annotation through association of InterPro records with GO terms.",
  # body of goref following goref title
  "goref_body": "Transitive assignment of GO terms based on InterPro classification. For any database entry (representing a protein or protein-coding gene) that has been annotated with one or more InterPro domains, The corresponding GO terms are obtained from a translation table of InterPro entries to GO terms (interpro2go) generated manually by the InterPro team at EBI. The mapping file is available at http://www.geneontology.org/external2go/interpro2go.",
  # `## Comments` section is pretty standard across existing gorefs
  "comments": "Formerly GOA:interpro. Note that GO annotations based on InterPro-to-GO transitive assignment may undergo subsequent filtering, e.g. to remove annotations redundant with manual curation; consult documentation from the annotation providers for further information."
}

CC: @kltm @sierra-moxon just wanted to gather your thoughts on this approach?

@sujaypatil96 sujaypatil96 self-assigned this Jan 26, 2022
@kltm
Copy link
Member Author

kltm commented Jan 26, 2022

Some discussion with @sujaypatil96; will look at potentially being able to lightly parse out markdown content and use YAML as a target format (as we still need non-devs to be able to easily manipulate the file).

@ValWood
Copy link
Contributor

ValWood commented Jan 28, 2022

Hi,
Just tagging @rachellyne as Intermine need this metadata file too.
At the moment if you filter on datasource you lose annotations that are not PMID supported.
cheers
Val

@kltm
Copy link
Member Author

kltm commented Feb 26, 2022

@sujaypatil96 Reopening as some issues for completing this that are held in the first comment have not been spun out into their own tickets (I believe), so this is still operating as a "super".

@kltm kltm reopened this Feb 26, 2022
@sujaypatil96 sujaypatil96 linked a pull request Mar 15, 2022 that will close this issue
8 tasks
@kltm kltm reopened this Nov 1, 2022
@kltm kltm assigned pkalita-lbl and unassigned sujaypatil96 Feb 10, 2024
@kltm
Copy link
Member Author

kltm commented Feb 15, 2024

@pkalita-lbl @pgaudet As this seems to fully cover the bases of what needs to be done, I'm going to close out all the old issues in preference for the new ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants