Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handler for BioThings API providing graph type data #20

Closed
kevinxin90 opened this issue Aug 19, 2020 · 5 comments
Closed

Handler for BioThings API providing graph type data #20

kevinxin90 opened this issue Aug 19, 2020 · 5 comments
Assignees

Comments

@kevinxin90
Copy link
Contributor

kevinxin90 commented Aug 19, 2020

Example Graph representation:

{
    "subject": {
        "id": "MONDO:000123",
        "type": "Disease"
    },
    "object": {
        "id": "NCBIGene:1017",
        "type": "Gene",
        "taxid": "9606"
    },
    "association": {
        "predicate": "negatively_regulates",
        "publications": ["PMID:123", "PMID:124"]
    }
}

Above output could be represented in another way by switching the subject & object and reverse the predicate, e.g.

{
    "subject": {
        "id": "NCBIGene:1017",
        "type": "Gene",
        "taxid": "9606"
    },
    "object": {
        "id": "MONDO:000123",
        "type": "Disease"
    },
    "association": {
        "predicate": "negatively_regulated_by",
        "publications": ["PMID:123", "PMID:124"]
    }
}

So if the user provides the following query

biothings.ncats.io/api1/query? \
    subject.id:"MONDO:000123" AND \
    object.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulates"

It should be translated into two queries

  1. same as user query
biothings.ncats.io/api1/query? \
    subject.id:"MONDO:000123" AND \
    object.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulates"
  1. reverse it
biothings.ncats.io/api1/query? \
    object.id:"MONDO:000123" AND \
    subbject.id:"NCBIGene:1017" AND \
    association.predicate:"negatively_regulated_by"

And the response from the 2nd query should also be reversed and merge with the first query.

In summary:

  1. translate user query into two queries (one original, one reverse query)
  2. For the reverse query,
  • all fields starting with object (e.g. object.id) should be replaced with subject
  • all fields starting with subject (e.g. subject.id) should be replaced with object
  • reverse the value association.predicate based on a mapping file
  1. For the reverse query result
  • change root key object into subject
  • change root key subject into object
  • reverse the value of association.predicate based on a mapping file
  1. merge the results from two queries.

We have an API set up providing graph type data for testing: https://biothings.ncats.io/biggim

@kevinxin90
Copy link
Contributor Author

A good test case is:
Currently https://biothings.ncats.io/biggim/query?q=subject.NCBIGene:6494%20AND%20object.NCBIGene:1956%20AND%20association.context.disease.id:%22MONDO:0006046%22 returns 1 hit.

But https://biothings.ncats.io/biggim/query?q=object.NCBIGene:6494%20AND%20subject.NCBIGene:1956%20AND%20association.context.disease.id:%22MONDO:0006046%22 doesn't.

If the feature is implemented correctly, the second query should be able to return 1 hit which is a reverse version of the 1 query.

@kevinxin90
Copy link
Contributor Author

The predicate mapping file is available here: https://github.com/biothings/pending.api/blob/master/predicate_mapping.json. But feel free to suggest other places which is more appropriate to put it. The key of the dictionary is the predicate, and the value is the reverse of the predicate.

namespacestd0 pushed a commit that referenced this issue Sep 9, 2020
@namespacestd0
Copy link

@kevinxin90 I wanna go over this initial implementation with you before I refine the details, please check if this is the intended behavior:

For a query to http://localhost:8000/biggim/query/graph:

{
    "subject": {
        "NCBIGene": 1956
    },
    "object": {
        "NCBIGene": 6494
    },
    "association": {
        "context": {
            "disease":{
                "id": "MONDO:0006046"
            }
        }
    }
}

It is translated to:

  {
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "must": [
                  {
                    "multi_match": {
                      "fields": "subject.NCBIGene",
                      "lenient": true,
                      "operator": "and",
                      "query": 1956
                    }
                  },
                  {
                    "multi_match": {
                      "fields": "object.NCBIGene",
                      "lenient": true,
                      "operator": "and",
                      "query": 6494
                    }
                  },
                  {
                    "multi_match": {
                      "fields": "association.context.disease.id",
                      "lenient": true,
                      "operator": "and",
                      "query": "MONDO:0006046"
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "multi_match": {
                      "fields": "association.context.disease.id",
                      "lenient": true,
                      "operator": "and",
                      "query": "MONDO:0006046"
                    }
                  },
                  {
                    "multi_match": {
                      "fields": "object.NCBIGene",
                      "lenient": true,
                      "operator": "and",
                      "query": 1956
                    }
                  },
                  {
                    "multi_match": {
                      "fields": "subject.NCBIGene",
                      "lenient": true,
                      "operator": "and",
                      "query": 6494
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }

and got the result:

{
    "took": 2,
    "total": 1,
    "max_score": 5.329261,
    "hits": [
        {
            "_id": "HGNC_10885-HGNC_3236--MONDO_0006046",
            "_score": 5.329261,
            "association": {
                "edge_label": "coexpressed_with",
                "relation_id": "RO:0002610",
                "relation_name": "correlated_with",
                "correlation": 0.1451,
                "pvalue": 0.054954087385762455,
                "context": {
                    "disease": {
                        "id": "MONDO:0006046",
                        "UBERON": "MONDO:0006046",
                        "name": "ovarian serous cystadenocarcinoma"
                    }
                }
            },
            "object": {
                "id": "HGNC:3236",
                "HGNC": 3236,
                "NCBIGene": 1956,
                "SYMBOL": "EGFR",
                "type": "gene"
            },
            "subject": {
                "id": "HGNC:10885",
                "HGNC": 10885,
                "NCBIGene": 6494,
                "SYMBOL": "SIPA1",
                "type": "gene"
            }
        }
    ]
}

If this is the expected behavior, I'll add the result transformation logic and other refinements.

@namespacestd0
Copy link

namespacestd0 commented Sep 9, 2020

@newgene do you have an opinion on the additional input format support, should we take extra time to implement dot notation support, mixing url params & url encoded POST body, etc?

@namespacestd0
Copy link

Implemented in

class GraphQueryHandler(ESRequestHandler):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants