implement edge attribute constraints #795

andrewsu · 2024-03-15T20:59:06Z

We originally proposed edge attribute constraints in the context of TRAPI 1.3 and #482, but breaking this out to its own ticket.

We have a solid use case for edge constraints proposed in this query template for the CQS:
https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json

The key bit is here, attempting to apply a minimum threshold on the biolink:evidence_count from AEOLUS.

    "message": {
        "query_graph": {
            "edges": {
                "e0": {
                    "predicates": [
                        "biolink:applied_to_treat"
                    ],
                    "subject": "n0",
                    "object": "n1",
                    "attribute_constraints": [
                        {
                         "id": "biolink:evidence_count",   
                         "operator": ">",
                         "value": 20            
                        }
                    ]
                }
            },
...

There are (at least) two issues that need to be done/checked:

Filtering of API responses: I'm assuming that it will be easier to do the edge attribute filtering after the subquery, rather than trying to adjust the subquery itself.
Aggregation of multiple values: In this slack message, @colleenXu pointed out a case where evidence_count is provided as a multi-element array (example below). In this case, I think it is reasonable to apply the constraint to the sum of the evidence_counts.

{
  "edges": {
    "1feea171db6394cfd9bcb20deae0ad9a": {
      "predicate": "biolink:applied_to_treat",
      "subject": "PUBCHEM.COMPOUND:3386",
      "object": "MONDO:0002050",
      "attributes": [
        {
          "attribute_type_id": "biolink:evidence_count",
          "value": [
            733,
            42
          ]
        }
      ],
      "sources": [
        {
          "resource_id": "infores:aeolus",
          "resource_role": "primary_knowledge_source"
        },
        {
          "resource_id": "infores:mychem-info",
          "resource_role": "aggregator_knowledge_source",
          "upstream_resource_ids": [
            "infores:aeolus"
          ]
        },
        {
          "resource_id": "infores:service-provider-trapi",
          "resource_role": "aggregator_knowledge_source",
          "upstream_resource_ids": [
            "infores:mychem-info"
          ]
        }
      ]
    }
  }
}

The text was updated successfully, but these errors were encountered:

colleenXu · 2024-03-18T19:57:12Z

From my perspective, there's 3 issues at play here:

1: edge-attribute value type

BTE is currently returning this edge-attribute in Dev/CI instances (ref: commit).

However, the value type is currently an array of ints (click for examples).

These are from this example BTE response show-edge-attribute-issue.json, which runs the POST version of this query to MyChem

Example of a 1-element array:

                "dd9daae5b03bcad0698ff6669090f36b": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MEDDRA:10070592",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": [
                                875
                            ]
                        }
                    ],

Example of a multi-element array: the 733 count from Depression and 42 count from "Depressed mood" were put in the same edge/edge-attribute since both meddra IDs mapped to the "MONDO:0002050 (depressive disorder)" entity.

                "1feea171db6394cfd9bcb20deae0ad9a": {
                    "predicate": "biolink:applied_to_treat",
                    "subject": "PUBCHEM.COMPOUND:3386",
                    "object": "MONDO:0002050",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:evidence_count",
                            "value": [
                                733,
                                42
                            ]
                        }
                    ],

I suggest flattening these into ints, because the array will probably cause validation issues (biolink-model says attribute values should be int) and it'll make the edge-attribute constraint easier to implement.

But we'll need to decide what to do with the multi-element arrays. These are happening because MyChem has separate meddra indication IDs, but BTE/NodeNorm maps them to the same entity. BTE then merges those records into the same edge, and concatenates the counts in the edge-attribute value. I think we could either:

add the counts from separate records together
create separate edges for different counts (add to hash?)

(Note: I'm not sure about flattening all 1-element arrays in edge-attributes. biolink:publications may be one example where we always want it to be an array, but we'd need to check with TRAPI folks first...)

2: what to do with the previous effort - a default, hard-coded count limit

EDIT: SEE UPDATE BELOW - we've implemented this.

We've been trying to add a hard-coded count limit of 20 to our MyChem queries #727 (comment), similar to what we do with SEMMEDDB.

I was able to add it to the aeolusTreats operation (chem -> disease, commit), which all instances are using.

Old notes on reverse operation

But this hasn't been done for the reverse operation aeolusTreats-rev (disease -> chem), which is what creative-mode uses. In discussions last week (three Slack links), we finally reached consensus on next steps:

by adjusting the x-bte annotation, I can get partway there. See this commit (special-reverses branch)
next is writing/implementing the BTE JQ-post-processing to remove the hits when the aeolus.indications field is empty. While this should be quick, I'm unsure of the logic to use and would need to discuss with Jackson...
- something super-specific, that only works on responses from this operation?
- generic-ish: "remove hits if this is a BioThings API AND supportBatch is false AND the scopes field specified in the request body (aeolus.indications.meddra in this case) isn't in the hit".
  - the "BioThings API only" and "supportBatch is false" should match "special reverses" - which are the only cases where we'd need this logic
  - I don't have other current x-bte annotation examples where this would be useful. I suspect that it may be useful for writing reverse x-bte operations for MyChem chembl drug-mechanism and drugcentral bioactivity

But if we want to implement TRAPI-query edge-attribute constraints, it's not clear if we want to go forward with this. An edge-attribute constraint < 20 would conflict with this hard-coded limit.

3: how to implement this issue's ask: TRAPI-query edge-attribute constraints

This is still up for discussion:

how generic/general we want our approach to be
how quickly we can do this
do we still want a default, hard-coded count limit for these MyChem aeolus indication operations (ex: when an edge-attribute constraint isn't specified)?

Idea: if an edge-attribute constraint is specified...

after running all sub-queries/building records, filter the records to only those that have the edge-attributes + the values meet the criteria (need to double-check that this is the TRAPI spec).
- pros: this seems to be the easiest to do (conceptually easy, less chance of bugs)
- cons: wasted effort getting records we'll later throw out
for BioThings APIs, transform the constraint into part of the query

I had another idea of transforming the constraint into part of the BioThings API query using the x-bte annotation templating and info in the response-mapping, but this would be complex and more effort to think through and implement.

colleenXu · 2024-03-20T19:34:43Z

During today's group meeting, we made decisions on issues (1) and (3) above:

3: how to implement TRAPI-query edge-attribute constraints

agreed to do this after retrieving the sub-query response (vs transforming the constraint into part of the sub-query API call)
two ways to do this: we picked "at/after edge-merging" because we want to keep all the counts that are getting "merged" and add them together
- VS applying the constraint at record creation time, which would throw out individual records where the count < limit. The benefit is having less records to work with in future steps

1: how to flatten multi-element arrays

agreed to add counts from multi-element arrays together to create int, without doing any steps to remove/throw out counts beforehand. This makes the most conceptual sense.
- use a config list of attribute-type-ids to control when we do this. For now, that list will just include this attribute-type-id (biolink:evidence_count)
then apply constraint to those sums

Jackson @tokebe estimated that this would take ~2 days of work. But as part of this effort, they'll review the TRAPI yaml/spec docs for QEdge.attribute_constraints expectations and requirements - and they'll decide how full/robust of an implementation to do for this issue.

colleenXu · 2024-03-20T21:57:30Z

Note that I've made a PR to ask about this template TranslatorSRI/CQS#9:

tool queried (BTE vs Service-Provider-TRAPI)
attribute-constraint's value type - right now it's a string which is confusing
the attribute-type-id

colleenXu · 2024-04-01T21:18:07Z

Update! The template PR TranslatorSRI/CQS#9 has been merged. So the current template is https://github.com/TranslatorSRI/CQS/blob/main/templates/mvp1-templates/mvp1-template4-bte-aeolus/mvp1-template5-service-provider-aeolus.json

Changes:

attribute-constraint value type is now a int
using Service-Provider-TRAPI, rather than BTE, then using a separate scoring service. Going to try this out.

colleenXu · 2024-04-09T21:12:00Z

Update on issue 2 above

The hard-coded/default/MyChem-query-level limit is now live in Dev/CI! See the details in #727 (comment)

colleenXu · 2024-06-12T20:41:12Z

And noting that Issue 1 was also addressed last month as part of #727 (comment)

Leaving just Issue 3 - the edge attribute constraint implementation itself (first post, later decision)

andrewsu · 2024-10-04T20:34:35Z

Going to add this to our agenda for next week to discuss. We tabled this earlier this year, and I'd like to revisit where we stand w.r.t. current priorities.

For updated info/context, BioLink Model undertook the "treats refactor", which separates out different types of evidence into different predicates (e.g. "in_clinical_trials_for", "beneficial_in_models_for"). CQS then created one hop templates that capture how to query translator with each of these predicates, and these queries often include edge constraints. (Though we could review those templates in more detail to look at what those edge constraints actually do and how common they are.)

Right now, our first MVP1 template queries the root of the treats predicate hierarchy (biolink:treats_or_applied_or_studied_to_treat). We should break each treats predicate out to different templates (because they have very different levels of confidence; see new issue #881), and then add in the CQS edge constraints once that functionality is available.

colleenXu · 2024-10-29T06:01:04Z

Andrew said we don't need to get this done in BTE by the end of this phase.

We will want to be able to do edge-attribute constraints in Retriever (or Shepherd?) (CC @tokebe)

tokebe · 2024-10-29T16:03:48Z

We'll want support for edge-attribute constraints in Retriever

colleenXu mentioned this issue Mar 18, 2024

for entity-based record structures (BioThings APIs), "reverse" operations cannot retrieve the same information as "forward" operations #316

Open

colleenXu mentioned this issue Apr 2, 2024

jmespath: removing higher-level objects based on lower-level matches biothings/biothings.api#325

Open

colleenXu mentioned this issue Apr 9, 2024

tune the use of AEOLUS indications from mychem.info #727

Closed

rjawesome self-assigned this Jun 10, 2024

rjawesome mentioned this issue Jun 12, 2024

Edge/Node constraints biothings/bte_trapi_query_graph_handler#194

Open

colleenXu added the next phase for future if we're funded label Oct 29, 2024

andrewsu mentioned this issue Nov 5, 2024

incorporate CQS edge constraints into templates biothings/bte_trapi_query_graph_handler#227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement edge attribute constraints #795

implement edge attribute constraints #795

andrewsu commented Mar 15, 2024 •

edited by colleenXu

Loading

colleenXu commented Mar 18, 2024 •

edited

Loading

colleenXu commented Mar 20, 2024 •

edited

Loading

colleenXu commented Mar 20, 2024

colleenXu commented Apr 1, 2024

colleenXu commented Apr 9, 2024 •

edited

Loading

colleenXu commented Jun 12, 2024 •

edited

Loading

andrewsu commented Oct 4, 2024 •

edited

Loading

colleenXu commented Oct 29, 2024

tokebe commented Oct 29, 2024

implement edge attribute constraints #795

implement edge attribute constraints #795

Comments

andrewsu commented Mar 15, 2024 • edited by colleenXu Loading

colleenXu commented Mar 18, 2024 • edited Loading

1: edge-attribute value type

2: what to do with the previous effort - a default, hard-coded count limit

3: how to implement this issue's ask: TRAPI-query edge-attribute constraints

colleenXu commented Mar 20, 2024 • edited Loading

colleenXu commented Mar 20, 2024

colleenXu commented Apr 1, 2024

colleenXu commented Apr 9, 2024 • edited Loading

Update on issue 2 above

colleenXu commented Jun 12, 2024 • edited Loading

andrewsu commented Oct 4, 2024 • edited Loading

colleenXu commented Oct 29, 2024

tokebe commented Oct 29, 2024

andrewsu commented Mar 15, 2024 •

edited by colleenXu

Loading

colleenXu commented Mar 18, 2024 •

edited

Loading

colleenXu commented Mar 20, 2024 •

edited

Loading

colleenXu commented Apr 9, 2024 •

edited

Loading

colleenXu commented Jun 12, 2024 •

edited

Loading

andrewsu commented Oct 4, 2024 •

edited

Loading