-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a new GO_REF metadata and update system #1764
Comments
I was thinking about the data format for storing the GO_REF metadata, and I think it makes sense to use JSON, given that it's a serialization format? The current GO_REFs, for ex., goref-0000002, are markdown files that use a combination of YAML syntax as well as markdown syntax. Reasoning: A good use case for using YAML is as a specification format, for ex., for specifying configuration files. But here, we need a format that can store the GO_REF metadata which will eventually be used by the geneontology.org site. If we were to adopt JSON as the format of choice for this metadata system, the natural course of action would be to write a parser that parses all the current GO_REF markdown files, into JSON. Considering goref-0000002 as an example, we can parse out the YAML block and markdown sections into JSON in the following way: {
# attributes from YAML block
"alt_id": [
"GO_REF:0000007",
"GO_REF:0000014",
"GO_REF:0000016",
"GO_REF:0000017"
],
"authors": "DDB, FB, MGI, GOA, ZFIN curators",
"external_accession": [
"MGI:2152098",
"J:72247",
"ZFIN:ZDB-PUB-020724-1",
"FB:FBrf0174215",
"dictyBase_REF:10157",
"SGD_REF:S000124036"
],
"id": "GO_REF:0000002",
"year": 2001,
"layout": "goref",
# main goref title with `##` markdown header syntax
"goref_title": "Gene Ontology annotation through association of InterPro records with GO terms.",
# body of goref following goref title
"goref_body": "Transitive assignment of GO terms based on InterPro classification. For any database entry (representing a protein or protein-coding gene) that has been annotated with one or more InterPro domains, The corresponding GO terms are obtained from a translation table of InterPro entries to GO terms (interpro2go) generated manually by the InterPro team at EBI. The mapping file is available at http://www.geneontology.org/external2go/interpro2go.",
# `## Comments` section is pretty standard across existing gorefs
"comments": "Formerly GOA:interpro. Note that GO annotations based on InterPro-to-GO transitive assignment may undergo subsequent filtering, e.g. to remove annotations redundant with manual curation; consult documentation from the annotation providers for further information."
} CC: @kltm @sierra-moxon just wanted to gather your thoughts on this approach? |
Some discussion with @sujaypatil96; will look at potentially being able to lightly parse out markdown content and use YAML as a target format (as we still need non-devs to be able to easily manipulate the file). |
Hi, |
@sujaypatil96 Reopening as some issues for completing this that are held in the first comment have not been spun out into their own tickets (I believe), so this is still operating as a "super". |
Here are new issues to cover the remaining work. @kltm I'll leave it up to you if you want to assign these, add them issues to a projects, or update the task list in the description of this issue.
|
@pkalita-lbl @pgaudet As this seems to fully cover the bases of what needs to be done, I'm going to close out all the old issues in preference for the new ones. |
Create a new GO_REF metadata and update system that:
Steps could be:
GO_REF yamldown parser #1785
gorefs.yaml
file #1804gorefs.yaml
#1805Tagging @cmungall @ValWood @kimrutherford
The text was updated successfully, but these errors were encountered: