Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more specific operations for MyChem chembl.drug_mechanisms data #100

Closed
colleenXu opened this issue Dec 21, 2022 · 11 comments
Closed

more specific operations for MyChem chembl.drug_mechanisms data #100

colleenXu opened this issue Dec 21, 2022 · 11 comments
Labels

Comments

@colleenXu
Copy link

colleenXu commented Dec 21, 2022

Related to biothings/biothings_explorer#532 (comment)

MyChem chembl.drug_mechanisms data, in subject-association-object format

  • We want 1 association per unique combo of subject/object/action_type.
    • Right now in MyChem, all data is aggregated by unique chemical (chem-centric format). So when a chemical has multiple drug_mechanisms, we can't retrieve only the sections that have a specific action_type.
    • MyChem currently has info for 5262 unique chemicals
  • We want the gene ID to be NCBIGene (or UniProtKB or ENSEMBL ENSG maybe).
    • Right now, it's CHEMBL.TARGET and Translator's Node Normalizer doesn't have cross-mappings/name-retrieval for that ID namespace
@colleenXu colleenXu changed the title New API based on MyChem chembl.drug_mechanisms data New API based on MyChem chembl.drug_mechanisms data Dec 21, 2022
@colleenXu
Copy link
Author

Related issue: biothings/biothings_explorer#316, JQ development work

@colleenXu
Copy link
Author

One reason to create a pending api / adjust the parser for that resource: the gene ID being CHEMBL.TARGET is a problem that isn't solved by post-processing (JQ).

@newgene
Copy link
Member

newgene commented Dec 21, 2022

Cross-ref a related issue here: biothings/mygene.info#105 (mapping from CHEMBL Target ID to gene id)

@colleenXu
Copy link
Author

colleenXu commented Dec 28, 2022

Notes on the data

(6306 records total https://www.ebi.ac.uk/chembl/g/#browse/mechanisms_of_action/)

Drugs

Not all drugs are "Small molecule". In rough order most to least:

  • almost 75% are "Small molecule" (4727)
  • almost 14% are "Antibody" (866)
  • other:
    • Protein
    • Unknown
    • Enzyme
    • Oligonucleotide
    • Oligosaccharide
    • Gene
    • Cell

Targets

Not all targets are human stuff. In rough order from most to least:

  • almost 83% are "Homo sapiens" (5229)
  • others are bacteria, virus, fungi, or - N/A - targets

Mot all targets are proteins. In rough order from most to least:

  • almost 69% are "SINGLE PROTEIN" (4155)
  • almost 11% are "PROTEIN FAMILY" (667)
  • Other:
    • - N/A -
    • PROTEIN COMPLEX
    • PROTEIN COMPLEX GROUP
    • NUCLEIC-ACID
    • SMALL MOLECULE
    • PROTEIN NUCLEIC-ACID COMPLEX
    • OLIGOSACCHARIDE
    • MACROMOLECULE
    • METAL
    • SUBCELLULAR
    • CHIMERIC PROTEIN
    • LIPID
    • SELECTIVITY GROUP
    • PROTEIN-PROTEIN INTERACTION
    • UNKNOWN
    • ORGANISM
    • CELL-LINE

categories of drug mechanisms

when browsing chembl https://www.ebi.ac.uk/chembl/g/#browse/mechanisms_of_action (roughly in order of most to least):

  • INHIBITOR
  • ANTAGONIST
  • AGONIST
  • - N/A - 🧇
  • BINDING AGENT 🧇
  • BLOCKER
  • MODULATOR 🧇
  • POSITIVE ALLOSTERIC MODULATOR
  • HYDROLYTIC ENYZME
  • ACTIVATOR
  • Other (in alphabetic order, not most to least)
    • ALLOSTERIC ANTAGONIST
    • ANTISENSE INHIBITOR
    • CHELATING AGENT
    • CROSS-LINKING AGENT 🧇
    • DEGRADER
    • DISRUPTING AGENT
    • INVERSE AGONIST
    • NEGATIVE ALLOSTERIC MODULATOR
    • NEGATIVE MODULATOR
    • OPENER
    • OTHER 🧇
    • OXIDATIVE ENYZME
    • PARTIAL AGONIST
    • POSITIVE MODULATOR
    • PROTEOLYTIC ENZYME
    • REDUCING AGENT
    • RELEASING AGENT
    • RNAI INHIBITOR
    • SEQUESTERING AGENT
    • STABILISER
    • SUBSTRATE
    • VACCINE ANTIGEN

🧇 not as helpful for the creative-mode issue 532

@colleenXu
Copy link
Author

Example of current MyChem structure vs association-based structure

Background

ANG1005 (CHEMBL1089636) has two drug-mechanisms that are different categories:

  • BINDING AGENT for Prolow-density lipoprotein receptor-related protein 1 (chembl target CHEMBL4630884 aka Q07954)
  • INHIBITOR for tubulin (chembl target CHEMBL2095182 aka a bunch of UniProt accessions because it's a protein complex group)

In MyChem

these two drug-mechanisms are nested inside the chembl.drug_mechanisms field of the MyChem record for this chemical: https://mychem.info/v1/query?q=_exists_:%22chembl.drug_mechanisms%22%20AND%20chembl.molecule_chembl_id:CHEMBL1089636. This means BTE post-processing (JQ?) is needed to retrieve only the INHIBITOR drug-mechanism (or vice versa).

{
    "chembl": {
        "molecule_chembl_id": "CHEMBL1089636",
        "drug_mechanisms": [
            {"action_type": "INHIBITOR", "references": {...}, "binding_site_name": null, "target_chembl_id": "CHEMBL2095182", "target_uniprot_accession": ["P68371", ...]},
            {"action_type": "BINDING_AGENT", "references": {...}, "binding_site_name": null, "target_chembl_id": "CHEMBL4630884", "target_uniprot_accession": "Q07954"}
        ]
    }
}

association-based structure

However, if we use an association-based structure, we can make two separate records. And these two records can be retrieved separately depending on what association.action_type is set to when querying.

{
    "subject": { "drug_chembl_id": "CHEMBL1089636", ...},
    "association": { "action_type": "INHIBITOR", "references": {...}, "binding_site_name": null},
    "object": { "target_chembl_id": "CHEMBL2095182", "target_uniprot_accession": ["P68371", ...]}
},
{
    "subject": { "drug_chembl_id": "CHEMBL1089636", ...},
    "association": { "action_type": "BINDING_AGENT", "references": {...}, "binding_site_name": null},
    "object": { "target_chembl_id": "CHEMBL4630884", "target_uniprot_accession": "Q07954"}
}

@rjawesome
Copy link
Collaborator

I've started a pending API python script if the association-based structure is preferred. Will post repo soon.

@colleenXu
Copy link
Author

colleenXu commented Dec 29, 2022

Note, I'm not sure how to handle the records that may lack an "action_type" value...they seem to lack a lot of information...

https://mychem.info/v1/query?q=_exists_:chembl.drug_mechanisms%20AND%20(NOT%20_exists_:%22chembl.drug_mechanisms.action_type%22)&fields=chembl
Screen Shot 2022-12-28 at 8 38 58 PM

@rjawesome
Copy link
Collaborator

@colleenXu
Copy link
Author

colleenXu commented Feb 27, 2023

At the moment, creating a new API is not necessary.

@colleenXu
Copy link
Author

Leaving open; in the future, we may want to write more specific operations (see the third bullet point "haven't done this yet" in the post above). I therefore moved this issue to "on-hold"

@colleenXu colleenXu changed the title New API based on MyChem chembl.drug_mechanisms data more specific operations for MyChem chembl.drug_mechanisms data Sep 14, 2023
@colleenXu colleenXu added the x-bte label Nov 1, 2023
@colleenXu
Copy link
Author

Closing for now: will open another issue to consider writing more specific operations using filter/jmespath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants