Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some node expansion ontologies are not monoprefix #698

Closed
tokebe opened this issue Aug 15, 2023 · 3 comments
Closed

Some node expansion ontologies are not monoprefix #698

tokebe opened this issue Aug 15, 2023 · 3 comments
Assignees
Labels
bug Something isn't working On Test Related changes are deployed to Test server

Comments

@tokebe
Copy link
Member

tokebe commented Aug 15, 2023

Some of the ontologies used by present node expansion do not use purely the prefix for which they are named, which results in rare cases where BTE will fail a query with TypeError as the subclass edge-sourcing code is unable to handle this situation. Further, the subclassing code uses a base assumption that the ontology source will line up with the prefix, which clearly is not correct for these cases.

We need to figure out a better solution for subclass edge sourcing that avoids running into this problem.

The following (unfortunately long) query can replicate the error:
{
  "message": {
    "query_graph": {
      "edges": {
        "edge_1": {
          "attribute_constraints": [],
          "object": "disease",
          "predicates": [
            "biolink:ameliorates"
          ],
          "qualifier_constraints": [],
          "subject": "h"
        }
      },
      "nodes": {
        "disease": {
          "categories": [
            "biolink:DiseaseOrPhenotypicFeature"
          ],
          "constraints": [],
          "ids": [
            "UMLS:C1847014",
            "UMLS:C2751329",
            "UMLS:C0001485",
            "UMLS:C0009450",
            "UMLS:C0020538",
            "UMLS:C0497247",
            "UMLS:C0007194",
            "UMLS:C4551472",
            "UMLS:C0020621",
            "UMLS:C0683388",
            "UMLS:C1514284",
            "UMLS:C1971021",
            "UMLS:C0019693",
            "UMLS:C0034186",
            "UMLS:C0020476",
            "UMLS:C0021400",
            "UMLS:C0029342",
            "UMLS:C0155871",
            "UMLS:C0042331",
            "UMLS:C0149931",
            "UMLS:C0744641",
            "UMLS:C0039503",
            "UMLS:C1568272",
            "UMLS:C0042384",
            "OMIM:MTHU037025",
            "UMLS:C0019080",
            "UMLS:C0041671",
            "UMLS:C0339002",
            "UMLS:C1263846",
            "UMLS:C1321905",
            "UMLS:C0155626",
            "UMLS:C0029827",
            "UMLS:C0036323",
            "UMLS:C0006261",
            "UMLS:C0006266",
            "UMLS:C0275524",
            "UMLS:C0729584",
            "UMLS:C0022658",
            "UMLS:C1408258",
            "UMLS:C0036983",
            "UMLS:C0600327",
            "UMLS:CN204669",
            "UMLS:C0003862",
            "UMLS:C0857177",
            "UMLS:C0262564",
            "UMLS:C0340293",
            "UMLS:C0003864",
            "UMLS:C0574941",
            "UMLS:C0010054",
            "UMLS:C0010068",
            "UMLS:C0264694",
            "UMLS:C1533195",
            "UMLS:C1956346",
            "UMLS:C0085111",
            "UMLS:C0205700",
            "UMLS:C0700053",
            "UMLS:C3495498",
            "UMLS:C0019572",
            "UMLS:C0031572",
            "UMLS:C0011881",
            "UMLS:C0006625",
            "UMLS:C0563273",
            "UMLS:C0038220",
            "UMLS:C1417172",
            "UMLS:C1144191",
            "UMLS:C3815188",
            "UMLS:C0038999",
            "MONDO:0021668",
            "UMLS:C1334180",
            "UMLS:C0342647",
            "UMLS:C2748783",
            "UMLS:C0034734",
            "UMLS:C0034735",
            "UMLS:C1282916",
            "GO:0006954",
            "UMLS:C0263854",
            "UMLS:C1384641",
            "UMLS:C0020445",
            "UMLS:C0549399",
            "UMLS:C0745103",
            "UMLS:C3276941",
            "UMLS:C0007222",
            "UMLS:C0243050",
            "UMLS:C0728936",
            "UMLS:C0042075",
            "UMLS:C1335051",
            "UMLS:C2675113",
            "MONDO:0016158",
            "UMLS:C3495801",
            "UMLS:C4050407",
            "UMLS:C0040820",
            "UMLS:C0678222",
            "UMLS:C0014118",
            "UMLS:C0375268",
            "UMLS:C0020649",
            "UMLS:C0032285",
            "MESH:C072778",
            "UMLS:C0009443",
            "GO:0000500",
            "UMLS:C0023772",
            "UMLS:C0154251",
            "UMLS:C0027796",
            "UMLS:C0751373",
            "UMLS:C1262477",
            "UMLS:C0001623",
            "UMLS:C0405580",
            "UMLS:C0376286",
            "UMLS:C1510471",
            "OMIM:MTHU027819",
            "UMLS:C1868649",
            "UMLS:C1868650",
            "UMLS:C1868651",
            "UMLS:C0024138",
            "UMLS:C5574816",
            "UMLS:C0006444",
            "UMLS:C0027051",
            "UMLS:C0003123",
            "UMLS:C1971624",
            "UMLS:C0024141",
            "UMLS:C1835309",
            "UMLS:C1366529",
            "UMLS:C3493221",
            "UMLS:C1836230",
            "UMLS:C1836231",
            "UMLS:C1836232",
            "UMLS:CN282826",
            "UMLS:C0730345",
            "UMLS:C1654921",
            "UMLS:C0236773",
            "UMLS:C0853193",
            "UMLS:C0011847",
            "UMLS:C0011849",
            "UMLS:C2347126",
            "UMLS:C0027404",
            "MONDO:0044684",
            "UMLS:C4553904",
            "UMLS:C0263666",
            "OMIM:MTHU039158",
            "OMIM:MTHU050872",
            "UMLS:C0039254",
            "UMLS:C0152073",
            "UMLS:C0003838",
            "UMLS:C0031115",
            "UMLS:C0085096",
            "UMLS:C0264995",
            "UMLS:C0750145",
            "UMLS:C1704436",
            "UMLS:C4025272",
            "UMLS:C4531019",
            "UMLS:C0029574",
            "UMLS:C0037268",
            "UMLS:C0037274",
            "UMLS:C0037277",
            "UMLS:C0002170",
            "UMLS:C0004096",
            "UMLS:C0085129",
            "UMLS:C0340062",
            "UMLS:C1833269",
            "UMLS:C1833270",
            "UMLS:C1869116",
            "UMLS:C3714497",
            "UBERON:0035938",
            "UMLS:C0020626",
            "UMLS:C0600139",
            "UMLS:C0016658",
            "MONDO_0004197",
            "UMLS:C0023467",
            "UMLS:C3275959",
            "UMLS:C0595948",
            "UMLS:C0162550",
            "UMLS:C0007820",
            "UMLS:C5203670",
            "UMLS:C0029877",
            "UMLS:C0699744",
            "UMLS:C0014130",
            "UMLS:C4025823",
            "UMLS:C0393735",
            "UMLS:C0018801",
            "UMLS:C0018802",
            "UMLS:C0264716",
            "UMLS:C0002103",
            "UMLS:C0847614",
            "UMLS:C2607914",
            "EFO:0003913",
            "UMLS:C0039103",
            "UMLS:C4521004",
            "UMLS:C1318485",
            "UMLS:C4520976",
            "UMLS:CN205145",
            "UMLS:C0023890",
            "UMLS:C0036690",
            "UMLS:C0243026",
            "UMLS:C0024117",
            "UMLS:C1527303",
            "UMLS:C0023646",
            "UMLS:C0149514",
            "UMLS:C0339289",
            "UMLS:C0349702",
            "UMLS:C1959583",
            "UMLS:CN236639",
            "UMLS:C0746102",
            "MESH:D003094",
            "UMLS:C0003851",
            "UMLS:C0003872",
            "UMLS:C0040021",
            "UMLS:C0009766",
            "UMLS:C0004238",
            "UMLS:C0149721",
            "UMLS:C0020517",
            "UMLS:C1527304",
            "UMLS:C3539909",
            "UMLS:C0001175",
            "UMLS:C0012242",
            "UMLS:C0017178",
            "UMLS:C4023588",
            "MONDO:0009410",
            "UBERON:0002185",
            "UMLS:C0018081",
            "UMLS:C0031046",
            "UMLS:C0033860",
            "UMLS:C0262985",
            "UMLS:C0004623",
            "UMLS:C0003873",
            "UMLS:C1306838",
            "UMLS:C1833448",
            "UMLS:C0085074",
            "UMLS:C0013604",
            "UMLS:C0268000",
            "UMLS:C0085605",
            "UMLS:C1306571",
            "UMLS:C0038013",
            "UMLS:C0038020",
            "UMLS:C0155877",
            "UMLS:C0013990",
            "UMLS:C0029607",
            "UMLS:C0034067",
            "UMLS:C0040460",
            "UMLS:C0038454",
            "UMLS:C0002965",
            "UMLS:C0086666",
            "UMLS:C0036341",
            "UMLS:C4538533",
            "UMLS:C0282687",
            "UMLS:C0008677",
            "UMLS:C0021603",
            "UMLS:C0917801",
            "UMLS:C4477058",
            "UMLS:C3277428",
            "UMLS:C0007193",
            "UMLS:C0029408",
            "UMLS:C0157946",
            "UMLS:C0013595"
          ],
          "is_set": false
        },
        "h": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "constraints": [],
          "ids": [
            "UMLS:C0032952",
            "UNII:ULS5I8J03O",
            "DRUGBANK:DB01234",
            "UMLS:C0069803",
            "UMLS:C0851344",
            "UMLS:C0025815",
            "UMLS:C0048306",
            "UMLS:C0076653",
            "UMLS:C0544368",
            "UNII:DJJ95FJP1H",
            "UMLS:C0016299",
            "DRUGBANK:DB00324",
            "UMLS:C0020268",
            "DRUGBANK:DB00823",
            "UMLS:C0022635",
            "UMLS:C0907850",
            "UMLS:C0058753",
            "UMLS:C4255884",
            "UMLS:C0014912",
            "UMLS:C0011689",
            "UMLS:C0065864",
            "DRUGBANK:DB00620",
            "UMLS:C0039601",
            "UMLS:C0304612",
            "UMLS:C0012228",
            "DRUGBANK:DB00896",
            "UMLS:C0065879",
            "UMLS:C0961209",
            "UMLS:C0006674",
            "UMLS:C0016280",
            "UMLS:C0068992",
            "UMLS:C0038149",
            "UMLS:C0068395",
            "UMLS:C0073983",
            "UMLS:C0065767",
            "UMLS:C0012319",
            "UMLS:C0044410",
            "DRUGBANK:DB01108",
            "UMLS:C0056391",
            "UMLS:C1135150",
            "UMLS:C0016374",
            "UMLS:C0008992",
            "UMLS:C0055895",
            "UMLS:C0249582",
            "UMLS:C0065865",
            "UMLS:C0073631",
            "UMLS:C0051556",
            "UMLS:C0013340",
            "UMLS:C4256093",
            "UMLS:C0031406",
            "UMLS:C0060389",
            "UMLS:C0036071",
            "UMLS:C0054201",
            "UMLS:C0039600",
            "UMLS:C0032950",
            "UMLS:C0025152",
            "DRUGBANK:DB00838",
            "UMLS:C0360534",
            "DRUGBANK:DB00663",
            "UMLS:C1948374",
            "UMLS:C1619629",
            "UMLS:C0004905",
            "UMLS:C0012258",
            "UMLS:C0006949",
            "DRUGBANK:DB00596",
            "UMLS:C0016301",
            "UMLS:C0030454",
            "UMLS:C0004057",
            "UMLS:C0033429",
            "UMLS:C0005308",
            "UMLS:C0022860",
            "UMLS:C0069751",
            "UMLS:C0068788",
            "UMLS:C0025826",
            "UMLS:C1449554",
            "UMLS:C0071836",
            "PUBCHEM.COMPOUND:54684589",
            "PUBCHEM.COMPOUND:101611440",
            "PUBCHEM.COMPOUND:54708862",
            "PUBCHEM.COMPOUND:64738",
            "UMLS:C0016366",
            "UMLS:C0016298",
            "DRUGBANK:DB00682",
            "UMLS:C0002238",
            "UMLS:C0520442",
            "DRUGBANK:DB08970",
            "UMLS:C0006657",
            "UMLS:C2713074",
            "DRUGBANK:DB00764",
            "UMLS:C0043822",
            "DRUGBANK:DB01420",
            "UMLS:C0025042",
            "UMLS:C1289957",
            "UMLS:C0033308",
            "UMLS:C4256086",
            "UMLS:C4256087",
            "UMLS:C0772364",
            "UMLS:C0060501",
            "UMLS:C1646276",
            "UMLS:C0053289",
            "UMLS:C0961485",
            "UMLS:C0060156",
            "UMLS:C0015837",
            "UMLS:C0068397",
            "UMLS:C0010961",
            "DRUGBANK:DB00712",
            "UMLS:C0002158",
            "UMLS:C0008024",
            "UMLS:C0117996",
            "UMLS:C0023566",
            "UMLS:C0006881",
            "UMLS:C1703665",
            "UMLS:C0056855",
            "UMLS:C0058004",
            "UMLS:C0011707",
            "UMLS:C0042105",
            "UMLS:C0011705",
            "UMLS:C0037982",
            "UMLS:C0014695",
            "UMLS:C0014696",
            "UMLS:C0042866",
            "UMLS:C0036079",
            "UMLS:C0029995",
            "UMLS:C0012265",
            "UMLS:C0028356"
          ],
          "is_set": false
        }
      }
    }
  },
  "submitter": "infores:aragorn"
}

Tagging @colleenXu @ericz1803

@tokebe tokebe added the bug Something isn't working label Aug 15, 2023
@tokebe tokebe added this to the 2023-08-18 Code Freeze milestone Aug 15, 2023
@tokebe
Copy link
Member Author

tokebe commented Aug 15, 2023

Placing this one on the milestone, though it's uncertain if we'll be able to fix it in time, and it seems to happen only rarely.

@tokebe
Copy link
Member Author

tokebe commented Aug 15, 2023

Generating a list of prefixes for the files that use multiple prefixes:

hp-parsed.json: PR, BFO, NCBITaxon, UBERON, SO, CL, RO, CHEBI, MAXO, OBI, MPATH, CARO, GO, NBO, HP, HsapDv, ENVO, PATO
mondo-parsed.json: ENVO, MONDO, CHR, HsapDv, ECTO, BFO, PATO, RO, NBO, CARO, ExO, CL, MAXO, SO, OBI, OGMS, NCIT, PCO, FOODON, MF, UBERON, PO, UPHENO, MPATH, GO, CHEBI, HP, OBA, NCBITaxon, IAO

Looks like it's just these two that are non-monoprefix.

Note: both contain mappings where the parent is not HP or MONDO, so we can't just check the original vs expanded to reason about which file it came from.

@colleenXu
Copy link
Collaborator

Hmm...based on the old posts, we only intended to support 6 ontologies at the time. Hence my post on provenance for subclass edges here.

I'm not sure how the descendant-relationships were decided / parsed, and whether it's valid for all those other ontologies to be included.

I agree with the discussion in the lab slack thread that for now, we could remove the non-mondo IDs from the mondo file and the non-hp IDs from the hp file...

@tokebe tokebe added On CI Related changes are deployed to CI server On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server labels Aug 16, 2023
@tokebe tokebe closed this as completed Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working On Test Related changes are deployed to Test server
Projects
None yet
Development

No branches or pull requests

2 participants