Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every KG needs primary knowledge source for every edge #86

Open
capasfield opened this issue Mar 9, 2023 · 1 comment
Open

Every KG needs primary knowledge source for every edge #86

capasfield opened this issue Mar 9, 2023 · 1 comment

Comments

@capasfield
Copy link

capasfield commented Mar 9, 2023

By March 24 in DEV:

Every KP edge MUST return a biolink:primary_knowledge_source attribute. All of the attribute values MUST come from the infores repository: https://github.com/biolink/biolink-model/blob/master/infores_catalog_nodes.tsv

If a source does not exist in the catalog, please make a PR adding it to the catalog.

@mbrush
Copy link

mbrush commented Mar 9, 2023

I might add to this rule/convention the notion that KPs that generate new knowledge by analyzing data SHOULD report the sources of the data that they operate on - using the biolink:supporting_data_source property. The UI team has indicated that this is important information for end users to see.

Some examples of this:

  • EHR or environmental data sources used by ICEES or COHD to generate variable correlation edges
  • Clinical/EHR and omics (e.g GTEx, TCGA, GDSC) data sources used by Multiomics KP to generate clinical variable, gene expression, and other correlations
  • TCGA as a source of data used by CHP to generate various types of survival time edges
  • CMAP and CTRP and DepMap as sources of data used by MolePro to generate chemical and gene similarity edges
  • LINCS as a source of cell line expression data used by Improving Agent to create various gene correlation/upregulation edges
  • Pubmed database as a source of textual data used by TMKP to generate text mined associations
  • Medline database as a source of textual data used by Improving Agent phenotype/co-occurrence edges

Note that I added a box to the top of the TRAPI Retrieval Provenance Standard document outlining the key rules and conventions, including this one. See here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants