Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement edge attribute constraints #795

Open
andrewsu opened this issue Mar 15, 2024 · 9 comments
Open

implement edge attribute constraints #795

andrewsu opened this issue Mar 15, 2024 · 9 comments
Assignees
Labels
next phase for future if we're funded

Comments

@andrewsu
Copy link
Member

andrewsu commented Mar 15, 2024

We originally proposed edge attribute constraints in the context of TRAPI 1.3 and #482, but breaking this out to its own ticket.

We have a solid use case for edge constraints proposed in this query template for the CQS:
https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json

The key bit is here, attempting to apply a minimum threshold on the biolink:evidence_count from AEOLUS.

    "message": {
        "query_graph": {
            "edges": {
                "e0": {
                    "predicates": [
                        "biolink:applied_to_treat"
                    ],
                    "subject": "n0",
                    "object": "n1",
                    "attribute_constraints": [
                        {
                         "id": "biolink:evidence_count",   
                         "operator": ">",
                         "value": 20            
                        }
                    ]
                }
            },
...

There are (at least) two issues that need to be done/checked:

  • Filtering of API responses: I'm assuming that it will be easier to do the edge attribute filtering after the subquery, rather than trying to adjust the subquery itself.
  • Aggregation of multiple values: In this slack message, @colleenXu pointed out a case where evidence_count is provided as a multi-element array (example below). In this case, I think it is reasonable to apply the constraint to the sum of the evidence_counts.
{
  "edges": {
    "1feea171db6394cfd9bcb20deae0ad9a": {
      "predicate": "biolink:applied_to_treat",
      "subject": "PUBCHEM.COMPOUND:3386",
      "object": "MONDO:0002050",
      "attributes": [
        {
          "attribute_type_id": "biolink:evidence_count",
          "value": [
            733,
            42
          ]
        }
      ],
      "sources": [
        {
          "resource_id": "infores:aeolus",
          "resource_role": "primary_knowledge_source"
        },
        {
          "resource_id": "infores:mychem-info",
          "resource_role": "aggregator_knowledge_source",
          "upstream_resource_ids": [
            "infores:aeolus"
          ]
        },
        {
          "resource_id": "infores:service-provider-trapi",
          "resource_role": "aggregator_knowledge_source",
          "upstream_resource_ids": [
            "infores:mychem-info"
          ]
        }
      ]
    }
  }
}
@colleenXu
Copy link
Collaborator

colleenXu commented Mar 18, 2024

From my perspective, there's 3 issues at play here:

1: edge-attribute value type

BTE is currently returning this edge-attribute in Dev/CI instances (ref: commit).

However, the value type is currently an array of ints (click for examples).

These are from this example BTE response show-edge-attribute-issue.json, which runs the POST version of this query to MyChem

Example of a 1-element array:

                "dd9daae5b03bcad0698ff6669090f36b": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MEDDRA:10070592",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": [
                                875
                            ]
                        }
                    ],

Example of a multi-element array: the 733 count from Depression and 42 count from "Depressed mood" were put in the same edge/edge-attribute since both meddra IDs mapped to the "MONDO:0002050 (depressive disorder)" entity.

                "1feea171db6394cfd9bcb20deae0ad9a": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MONDO:0002050",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": [
                                733,
                                42
                            ]
                        }
                    ],

I suggest flattening these into ints, because the array will probably cause validation issues (biolink-model says attribute values should be int) and it'll make the edge-attribute constraint easier to implement.

But we'll need to decide what to do with the multi-element arrays. These are happening because MyChem has separate meddra indication IDs, but BTE/NodeNorm maps them to the same entity. BTE then merges those records into the same edge, and concatenates the counts in the edge-attribute value. I think we could either:

  • add the counts from separate records together
  • create separate edges for different counts (add to hash?)

(Note: I'm not sure about flattening all 1-element arrays in edge-attributes. biolink:publications may be one example where we always want it to be an array, but we'd need to check with TRAPI folks first...)

2: what to do with the previous effort - a default, hard-coded count limit

EDIT: SEE UPDATE BELOW - we've implemented this.

We've been trying to add a hard-coded count limit of 20 to our MyChem queries #727 (comment), similar to what we do with SEMMEDDB.

I was able to add it to the aeolusTreats operation (chem -> disease, commit), which all instances are using.

Old notes on reverse operation

But this hasn't been done for the reverse operation aeolusTreats-rev (disease -> chem), which is what creative-mode uses. In discussions last week (three Slack links), we finally reached consensus on next steps:

  • by adjusting the x-bte annotation, I can get partway there. See this commit (special-reverses branch)
  • next is writing/implementing the BTE JQ-post-processing to remove the hits when the aeolus.indications field is empty. While this should be quick, I'm unsure of the logic to use and would need to discuss with Jackson...
    • something super-specific, that only works on responses from this operation?
    • generic-ish: "remove hits if this is a BioThings API AND supportBatch is false AND the scopes field specified in the request body (aeolus.indications.meddra in this case) isn't in the hit".
      • the "BioThings API only" and "supportBatch is false" should match "special reverses" - which are the only cases where we'd need this logic
      • I don't have other current x-bte annotation examples where this would be useful. I suspect that it may be useful for writing reverse x-bte operations for MyChem chembl drug-mechanism and drugcentral bioactivity

But if we want to implement TRAPI-query edge-attribute constraints, it's not clear if we want to go forward with this. An edge-attribute constraint < 20 would conflict with this hard-coded limit.

3: how to implement this issue's ask: TRAPI-query edge-attribute constraints

This is still up for discussion:

  • how generic/general we want our approach to be
  • how quickly we can do this
  • do we still want a default, hard-coded count limit for these MyChem aeolus indication operations (ex: when an edge-attribute constraint isn't specified)?

Idea: if an edge-attribute constraint is specified...

  • after running all sub-queries/building records, filter the records to only those that have the edge-attributes + the values meet the criteria (need to double-check that this is the TRAPI spec).
    • pros: this seems to be the easiest to do (conceptually easy, less chance of bugs)
    • cons: wasted effort getting records we'll later throw out
  • for BioThings APIs, transform the constraint into part of the query

I had another idea of transforming the constraint into part of the BioThings API query using the x-bte annotation templating and info in the response-mapping, but this would be complex and more effort to think through and implement.

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 20, 2024

During today's group meeting, we made decisions on issues (1) and (3) above:

3: how to implement TRAPI-query edge-attribute constraints

  • agreed to do this after retrieving the sub-query response (vs transforming the constraint into part of the sub-query API call)
  • two ways to do this: we picked "at/after edge-merging" because we want to keep all the counts that are getting "merged" and add them together
    • VS applying the constraint at record creation time, which would throw out individual records where the count < limit. The benefit is having less records to work with in future steps

1: how to flatten multi-element arrays

  • agreed to add counts from multi-element arrays together to create int, without doing any steps to remove/throw out counts beforehand. This makes the most conceptual sense.
    • use a config list of attribute-type-ids to control when we do this. For now, that list will just include this attribute-type-id (biolink:evidence_count)
  • then apply constraint to those sums

Jackson @tokebe estimated that this would take ~2 days of work. But as part of this effort, they'll review the TRAPI yaml/spec docs for QEdge.attribute_constraints expectations and requirements - and they'll decide how full/robust of an implementation to do for this issue.

@colleenXu
Copy link
Collaborator

Note that I've made a PR to ask about this template TranslatorSRI/CQS#9:

  • tool queried (BTE vs Service-Provider-TRAPI)
  • attribute-constraint's value type - right now it's a string which is confusing
  • the attribute-type-id

@colleenXu
Copy link
Collaborator

Update! The template PR TranslatorSRI/CQS#9 has been merged. So the current template is https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json

Changes:

  • attribute-constraint value type is now a int
  • using Service-Provider-TRAPI, rather than BTE, then using a separate scoring service. Going to try this out.

@colleenXu
Copy link
Collaborator

colleenXu commented Apr 9, 2024

Update on issue 2 above

The hard-coded/default/MyChem-query-level limit is now live in Dev/CI! See the details in #727 (comment)

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 12, 2024

And noting that Issue 1 was also addressed last month as part of #727 (comment)

Leaving just Issue 3 - the edge attribute constraint implementation itself (first post, later decision)

@andrewsu
Copy link
Member Author

andrewsu commented Oct 4, 2024

Going to add this to our agenda for next week to discuss. We tabled this earlier this year, and I'd like to revisit where we stand w.r.t. current priorities.

For updated info/context, BioLink Model undertook the "treats refactor", which separates out different types of evidence into different predicates (e.g. "in_clinical_trials_for", "beneficial_in_models_for"). CQS then created one hop templates that capture how to query translator with each of these predicates, and these queries often include edge constraints. (Though we could review those templates in more detail to look at what those edge constraints actually do and how common they are.)

Right now, our first MVP1 template queries the root of the treats predicate hierarchy (biolink:treats_or_applied_or_studied_to_treat). We should break each treats predicate out to different templates (because they have very different levels of confidence; see new issue #881), and then add in the CQS edge constraints once that functionality is available.

@colleenXu colleenXu added the next phase for future if we're funded label Oct 29, 2024
@colleenXu
Copy link
Collaborator

Andrew said we don't need to get this done in BTE by the end of this phase.

We will want to be able to do edge-attribute constraints in Retriever (or Shepherd?) (CC @tokebe)

@tokebe
Copy link
Member

tokebe commented Oct 29, 2024

We'll want support for edge-attribute constraints in Retriever

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next phase for future if we're funded
Projects
None yet
Development

No branches or pull requests

4 participants