Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement "creative/inferred mode" for "what drugs may treat disease X" query #449

Closed
andrewsu opened this issue May 24, 2022 · 41 comments
Closed
Assignees

Comments

@andrewsu
Copy link
Member

Up to this point, Translator has focused on explicitly-declared query topologies, expressed as a subgraph of nodes and edges. Translator now wants ARAs to accept "inferred edges" in the query, which essentially give the ARA the freedom to return answers that are supported by any query topology. This design has been colloquially referred to as "creative mode".

By the end of June, Translator would like ARAs to be able to respond to one type of "inferred edge" query focusing on the question "what drugs may treat disease X". "Inferred mode" is specified by adding a "knowledge_type": "inferred" on the edge of a one-hop predict query (as proposed in this PR).

For the output, paths between the disease and the drug should have a maximum of three edges. There is currently no cap on the overall number of nodes or edges in a result.

This doc contains a list of example diseases to be queried: https://docs.google.com/document/d/1cuYrWHv6MzT3w7lZqpHQVc7WbMY5WuLiLOh9qI3TVRo/edit

In terms of our BTE implementation of inferred mode, I think we should build on the idea of using a library of common query topologies / metapaths extracted from DrugMechDB. This would be a relatively straightforward implementation of this feature, and it would build on work that we've already done for #181.

@tokebe tokebe self-assigned this May 25, 2022
@tokebe
Copy link
Member

tokebe commented May 25, 2022

Just adding my general plan here for quick reference:

  • At query graph assembly time, check if we're supposed to infer edges
  • If so, assemble new query graphs using templates from library
  • Halt execution for this query_graph_handler, and then split off all inferred query graphs into new query_graph_handlers, to execute sequentially.
  • Keep results of each handler
  • On finish, assemble each into a single response (handling deduplication of node references, etc.)
  • Return newly-assembled response

Main issue before start of implementation: I need that template query library, or a good place to start in generating one.

@andrewsu
Copy link
Member Author

At least initially, we can use this query template library created by @Carolina1396 https://github.com/Carolina1396/drug_mechanisms_rare_diseases_BioThingsExplorer/tree/main/src/query_templates, which was based at least in part on an analysis of the most common metapaths in DrugMechDB shown in https://github.com/SuLab/DMDB_Analysis/blob/main/1_code/1_basic_dmdb_analysis.ipynb.

@tokebe
Copy link
Member

tokebe commented Jun 2, 2022

@andrewsu I'm realizing there are some questions I've failed to ask regarding scope:

Should BTE support inferred-mode edges in multi-hop queries? More than one inferred edge?

I believe this should still be relatively simple in this implementation: we use our template query to replace the inferred edge, keeping the 'outside' topology the same, for as many templated queries as we have (in the case of multiple inferred edges, one graph for each possible combination of template queries as appropriate).

Implementation-wise this isn't very complex -- just a few extra loops to make sure we get all the permutations. Since we're not re-assembling results afterward, just de-duplicating and concatenating them, there's no overt complications with results assembly. I do however see some possible cases where we have a single multi-hop query with multiple inferred edges exploding very quickly.

Another simple clarifying question: When encountering an inferred edge, obviously BTE must create every query that can be generated from templates related to that inferred edge. I assume BTE should also still execute the original edge in addition?

@andrewsu
Copy link
Member Author

andrewsu commented Jun 2, 2022

Should BTE support inferred-mode edges in multi-hop queries? More than one inferred edge?

No, we can make the assumption (and enforce the limit) that inferred-mode queries are single-hop queries. If that's not true, then feel free to have BTE fail out with some appropriate error message.

When encountering an inferred edge, obviously BTE must create every query that can be generated from templates related to that inferred edge. I assume BTE should also still execute the original edge in addition?

Yes, I think that makes sense. Feel free to implement that in code, or by adding the appropriate template to the query library. Whichever route makes the most sense to you...

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 6, 2022

@tokebe @andrewsu

note on the query that is sent:

they may use "biolink:Drug" as the category....and we will likely want to add "ChemicalEntity" to this because we don't keep much under "Drug".

@andrewsu
Copy link
Member Author

andrewsu commented Jun 6, 2022

if it's clear how to fix it, go for it. If not, skip it...

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 6, 2022

@tokebe @andrewsu

I suggest some changes to the templates for this particular issue...

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 6, 2022

I've taken all the templates and turned them into 7 TRAPI queries, with some mods....#453 (comment)

However, these may need further tweaking (predicates, node constraint for the drug regulatory status aka whether it is a drug, EDIT: is_set:True on QNodes...)?

EDIT: node constraints not working: #174 (comment)

@colleenXu
Copy link
Collaborator

And...it's unclear to me how we'll assemble the separate TRAPI queries / QGraphs and their responses into 1 response / thing....is that the goal?

@tokebe
Copy link
Member

tokebe commented Jun 6, 2022

@colleenXu As for the response assembly: This was discussed in a meeting, the graphs and results get de-duplicated (remove/combine duplicate node/edge references) and then concatenated into one response. Inferred query graphs are dropped, at least until behavior to the contrary is specified.

Regarding the templates, I'll attempt to include the suggestions you made to the best of my ability (where appropriate -- I think there are some things that require more discussion, such as adding predicates). My goal is to make a system where we can easily add/change templates, so anything else can be easily fixed after primary implementation.

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 6, 2022

Note that we may be discussing more implementation details during this relay. I have a note from current attendance of the "creative mode" session: "grouping answers by drug". I'm not sure exactly what that would look like right now.


EDIT: I'm going to adjust my templates to keep the qNodeID naming given in the "initial query" thingy.

@tokebe
Copy link
Member

tokebe commented Jun 6, 2022

I also noted some question as to predicate/direction -- something along the lines of "does this creative query result specify something that actually helps, or is it the opposite?"

I'm not sure I was entirely clear if that reached an answer. Do we want to be incredibly careful with our predicate selection (both matching a creative qEdge to appropriate templates, and possibly controlling additional predicates in our templates, either statically or dynamically), or for this initial implementation should the focus be more on getting a result graph that is definitely related, whether it strictly fits the creative qEdge's predicate or not entirely?

@tokebe
Copy link
Member

tokebe commented Jun 7, 2022

My previous comment may be safely ignored for now - it has been reasonably answered by discussion in the relay, and my implementation will be trivial to fix if I misunderstood.

I'm cataloguing items which need to be fixed/combined when combining/de-duplicating results, which has led me to certain node attributes, namely:

  • num_source_node
  • num_target_node
  • source_qg_node
  • target_qg_node

I'm not totally familiar with their purpose -- how should these behave when combining multiple results? Remove source/target qg_nodes which aren't present in the original query_graph and adjust counts accordingly? Should counts be added between responses for overlapping nodes?

@andrewsu
Copy link
Member Author

andrewsu commented Jun 8, 2022

Regarding the node attributes listed above, check to see if they have any internal usage. If not, I think they can be removed from the output for now. I believe these were added as some sort of attempt to look at node degree (so that paths using very common nodes, or "hub nodes", can eventually be down-weighted in scoring). But, I don't think we actually got around to implementing anything using those attributes.

@tokebe
Copy link
Member

tokebe commented Jun 9, 2022

My current implementation searches for folders matching the subjectCategory-predicate-objectCategory of the original query (making multiple combinations if there are multiple categories/predicates), and assumes that any query templates in matching folders will be appropriate for the original query. I'm of course planning on adding functionality to ensure it's possible to search for reverse cases so we don't have to rewrite templates backwards.

This has, however, led to another question: should BTE attempt to expand the search to possible additional appropriate template groups based on the biolink hierarchy?

For instance, if the user queries something to the effect of ChemicalEntity-treats-DiseaseOrPhenotypicFeature, should BTE also try to find templates for ChemicalEntity-treats-Disease and ChemicalEntity-treats-PhenotypicFeature? Similarly, should the reverse case be checked -- expanding Disease to DiseaseOrPhenotypicFeature?

@andrewsu
Copy link
Member Author

andrewsu commented Jun 9, 2022

I suppose expanding the query classes to more specific subclasses would be useful, i.e., ChemicalEntity-treats-DiseaseOrPhenotypicFeature -> ChemicalEntity-treats-Disease and ChemicalEntity-treats-PhenotypicFeature. I think we shouldn't do the reverse.

But since we only have one creative mode template right now, I think it's also fine to punt on this class-traversal functionality until later. Up to you to decide based on the effort involved in just tackling it now...

@tokebe
Copy link
Member

tokebe commented Jun 20, 2022

@andrewsu @colleenXu As a note for the creation of query templates, I'm setting up the handling such that a template is expected to have one node named subject and one named object -- for instance, subject would be a ChemicalEntity, while object would be a DiseaseOrPhenotypicFeature, with the rest of the template adding the inferred nodes/edges (which don't need to conform to any naming convention) between these two. Templates are formatted just as normal TRAPI queries, for example:

{
  "message": {
    "query_graph": {
      "nodes": {
        "subject": {
          "categories": ["biolink:ChemicalEntity"]
        },
        "n1": {
          "categories": ["biolink:Gene"]
        },
        "object": {
          "categories": ["biolink:DiseaseOrPhenotypicFeature"]
        }
      },
      "edges": {
        "e01": {
          "subject": "subject",
          "object": "n1",
          "predicates": ["biolink:affects"]
        },
        "e03": {
          "subject": "object",
          "object": "n1",
          "predicates": ["biolink:caused_by"]
        }
      }
    }
  }
}

@colleenXu
Copy link
Collaborator

@tokebe I was using "disease" and "drug" as the node names in my query templates. I imagine that the final node names/qNodeIDs depend on what the UI needs, but I don't know what they need (....asking them as a next step).

@colleenXu
Copy link
Collaborator

@tokebe I've read up on this issue, and:

  • My assumption is that the queries are always asked as "biolink:Drug" -(biolink:treats)-> "biolink:Disease X". "X" might actually be a PhenotypicFeature or DiseaseOrPheno, but that would be handled by existing code (Node Normalizer adding semantic type if it finds a different one)
  • AKA no need to handle superclass/subclass stuff: ike what if "biolink:ChemicalEntity" (an ancestor) or "biolink:SmallMolecule" (a related thing but not a parent/ancestor) was used instead of "Drug". Or "PhenotypicFeature / DiseaseOrPheno" was used instead of "Disease"
  • AKA no need to handle "reversals". That sounds like a Drug Y would given and we'd have to find Diseases, by "reversing" predicates in the templates (or subject/object assignments in qEdges??)

@tokebe
Copy link
Member

tokebe commented Jun 20, 2022

  • I should clarify -- the subject/object naming convention is specifically limited to the query templates. Whatever node names the user uses are what are returned; subject/object is just used so my code can know specifically to which template nodes it should merge the user query nodes. I could have used "Drug" and "Disease", however subject/object felt like it would be much clearer in-code, and would be distinct enough in-template for cases where multiple disease/drug nodes might exist in a pathway.
  • What I stated with those specific categories was more just an example, however I understand I was operating with incorrect categories compared to what is expected for this initial milestone
  • I may be able to leverage existing code to handle subclasses depending on some factors, however I wanted to make sure I explicitly understood the expected behavior in case I might have to add additional code for biolink subclass handling.

Also, with reversing I think you've misunderstood what I mean? I have to handle reverse cases by nature of how my code searches for templates, something like:

"Did the user ask for Disease X -treated_by-> Drug instead of Drug -treats-> Disease X? Ok, check if this direction or the other matches one of the groups of templates we have."

The reason for this is that template groups are searched for by folder name, where the actual name of the folder is something like Drug_treats_Disease. Actual reversing for template execution/etc. is handled by query execution and is not what I'm concerned about here.

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 20, 2022

From discussion (first Andrew-I, then Jackson-I)

Big picture

  • two definitions of "query-template":
    • our current group of templates for "Drug-treats-DiseaseX" (inside a "library"), that'll be stored in folders in BTE
    • what the UI will send to the ARS, which will be sent to us. We assume they're working off a template
  • For now, we will assume the UI will follow "templates" to make the "creative-mode" queries that are sent to us.
    • 1 template right now: a one-hop with subject qNode category "Drug" (no ID), qEdge predicate "treats" with "knowledge_type:inferred" (tells us to do "creative-mode"), object qNode category "Disease" with an ID.
    • AKA don't need to worry about variations (qNode category inheritance, qEdge predicate inheritance, multi-hops, inverse "Disease X treated-by Drug", flipped "Drug X treats Disease", etc.)
  • Currently, we plan not to accept multi-hops that use "knowledge_type:inferred" on >= 1 edge. These queries could have LONG run-times
    • we plan to return quickly with some kind of error.

Template handling details

  • inside our templates:
    • prefer not use "subject"/"object" as qNodeIDs for the qNodes that will be modified using the query from the UI
      • decided: "creativeQuerySubject", "creativeQueryObject"
  • Handling the IDs in our templates vs the query from the UI
    • we'll take the qNodeIDs from the UI query, and replace the "creativeQuerySubject"/"creativeQueryObject" with them.
    • if we know what qNodeIDs / qEdgeID the UI is using, we could adjust our templates accordingly....but since we don't know them right now...
      • we don't want the qEdgeID the UI is using == qEdgeID in a template. If this happens, we'll modify the template's qEdgeID (an incrementing method like "-1", "-2" if "-1" is taken, "-3" if "-1" and "-2" are taken, etc.).
      • we don't want the qNodeIDs the UI is using == qNodeID for an extra node in a multi-hop template. If this happens, we'll modify the template's qNodeIDs (an incrementing method, explained above)
  • Handling result node/edge-bindings from different query-templates, during the "stitch-together" step
    • maybe we don't want to use the same qNodeIDs/qEdgeIDs when results come from different query-templates? So we use an incrementing method (explained above) to give different IDs....I'm not sure right now if this is going to look confusing, how it'll be used, and how helpful this will be...

@tokebe
Copy link
Member

tokebe commented Jun 21, 2022

Some of my notes related to this discussion:

For clarity in discussion of templates:

  • I'm using "user query" or "creative query" to refer to the query coming into BTE. While we're not expecting random users to use inferred mode, the UI is operating on behalf of the user. While the UI will likely be using some form of template, that's more internal to their operation -- as far as BTE knows/cares, this is an incoming query from some user like any other, hence my not using the term "template" in reference to the query BTE receives.
    • Additionally, while we can technically expect to only deal with specific templates for this type of query, since we don't have the specifics, it's best/easiest to plan around the "general case" (of course, with some considerations for getting out a Minimum Viable Product, so for now we can ignore features such as reversing/etc. as Colleen outlined above.
  • When I say template, I exclusively mean our own templates, not whatever is sent by the user/UI/ARS

Regarding different qNodeIDs/qEdgeIDs for responses from different templates:

  • We don't have a prescribed way to represent query topology from query graphs that are inferred, which leads to this rather ad-hoc solution.
  • It makes sense that we wouldn't want template query graphs to have namespace overlap in a final response to the user/UI as this would just be more confusing on top of necessarily having qNode/qEdge references to qNodes/qEdges that don't exist in the original query graph.
  • In the end, I doubt that having all these unique qNodeIDs/qEdgeIDs will be "helpful" per se, rather that it would simply be even more confusing to see overlapping query graphs if one were to extract all these qNodes/qEdges from the response.

@tokebe
Copy link
Member

tokebe commented Jun 22, 2022

(Moving from Slack discussion to here for ease-of-record)

It's been proposed that a result cap should be imposed to prevent oversized responses. What should this cap be, and are there any special considerations regarding its behavior/implementation?

I imagine the simplest form is something like "If the cumulative results reaches/surpasses , do not proceed with additional template sub-queries; Assemble final results and return."

Tagging @colleenXu @andrewsu for discussion.

@andrewsu
Copy link
Member Author

I imagine the simplest form is something like "If the cumulative results reaches/surpasses , do not proceed with additional template sub-queries; Assemble final results and return."

I agree that this is the simplest form of adding a result cap, and I think it's where we should start...

@tokebe
Copy link
Member

tokebe commented Jun 23, 2022

Further testing: with @colleenXu's template revision and now template ordering, it's possible now to stop at a reasonable number of results a fair amount of the time. However, it's still possible at times to get very large responses for one template, which causes BTE to stop after that one, but still maintain possibly very large numbers of results. It might be worth figuring out how to pass a result cap in sub-queries, or even to implement sub-query time-outs.

Additionally, I'm still concerned that we'll end up stopping after just the first template, which is going to be the basic one-hop, so I think further discussion is still warranted about the presence (or perhaps just the ordering) of the basic one-hop?

@tokebe
Copy link
Member

tokebe commented Jun 24, 2022

Additional behavioral question: currently, the user-query subject and object categories and IDs are merged with those of any matching templates, meaning if the user-query specifies ChemicalEntity, then sub-queries specifying SmallMolecule will actually query for both.

Given current template revision efforts, it seems like category merging may not be the best idea, so I've removed it in the latest commit. Should I re-enable this?

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 24, 2022

Replying to @tokebe's first post #449 (comment), first paragraph on "too many results / takes too long":

  • I'm not sure what "sub-query time-outs" means. Perhaps stopping the entire sub-querying process after a certain amount of time? That sounds somewhat interesting (an arbitrary limit, but also sounds relatively simple to do). We already have "sub-query timeouts" for each individual sub-query (stopping after 500s or so?).
  • The "result cap in sub-queries" reminds me of this issue, which has a related PR/branch that previously did something like this. However, I noted several issues:
    • a flexible limit dependent on the number of input IDs for the edge may be helpful
    • limiting records / sub-queries leads to an unpredictable number of final results
    • For an Explain-type query / more complicated topology where intersections of nodes will happen, it is difficult to predict how high to set the limit to get the nodes that will survive the intersecting process
  • I suggested other approaches: setting a lower cap than we currently have or pruning at the end of each hop.

@colleenXu
Copy link
Collaborator

Replying to @tokebe's first post #449 (comment), second paragraph on "is it okay to do only the 1-hop and finish":

  • My view has been that it's fine.
    • One-hops seem to give good answers
    • I've been under the impression that the UI team thinks these "creative mode" queries will run quickly and return relatively easy result sub-graphs to render / explain.....and doing the one-hop can achieve those goals...
    • It's a good indicator that plenty of stuff is linked directly to the disease, so any multi-hop templates will probably explode (have tons of results / take a long time).
  • However, I don't have a good indicator of how much "creative" is being prioritized vs "has good answers" or "is fast" or "is easy to understand".

@colleenXu
Copy link
Collaborator

I'm a little confused by @tokebe 's second comment #449 (comment):

  • I saw issues with merging the UI's incoming query (which may have biolink:Drug, ChemicalEntity, whatever) with what's in the templates, because I'm not accounting for this when I'm designing / testing the templates for their categories / predicates. For example, I'm trying out SmallMolecule rather than ChemicalEntity (so far it does only a small decrease, not really worth it).....and I wouldn't want BTE to add in ChemicalEntity / Drug / whatever just because the UI's incoming query has it...

@tokebe
Copy link
Member

tokebe commented Jun 27, 2022

In response to @colleenXu's first response:

I've introduced undue confusion here by used the term "sub-query" instead of something like "template-query" (I should change sub-query to template-query in the new code's logging as well...). Each template is handled as its own independent query, just bypassing our endpoints and with some special handling. What I was proposing was either passing a result limit to each template's execution, or limiting each template's execution time.

That said, there are definitely some complications with either of those solutions. Given your recent extended testing in Slack, I think it might be fine to leave the behavior as-is, and instead focus on just how many results is an acceptable amount before stopping?

In response to the second:

The impression I've gotten is that "creative" is more important than "fast", however this is partially assumption on my part based on the nature of expecting BTE to sort of "expand" the initial query (as well as what it looked like other teams were doing seeming to lean more toward "creative".

Additionally, I thought @andrewsu had commented more along the lines of concern that the straightforward one-hop may not be "creative enough" for Translator's taste. In the end, I'm just implementing, but I'd like to double-check with @andrewsu with regards to this.

In response to the third:

I'm not sure what's confusing (except for my incorrect use of sub-query instead of template-query), especially as your response perfectly addresses why I shouldn't re-enable category merging between the user query and our templates. I'll leave it disabled and remove the commented-out code as it won't be used.

@tokebe
Copy link
Member

tokebe commented Jun 27, 2022

Regarding how many results to limit creative mode to, I propose a two-number method.

The first number, X, dictates when to stop. If the queries added together have X or more results, don't execute any more templates.

The second number, Y, can be a sort of hard-coded buffer. If the currently-executed results are less than X, we only add the newest query's results if that number plus X is less than X + Y. (Otherwise, we might reasonably assume that all queries will exceed X + Y, and stop executing templates)

This should prevent cases where, say, X = 1000, and we reach 980 results, and then the next query has another 1000 results, meaning our response would have some number less than or equal to 1980, way over our intended limit.

I would say that Y could be somewhere between 100 and 500. I don't have a solid idea of what X should be, perhaps something between 1000-2000?

@colleenXu
Copy link
Collaborator

@tokebe

vocab:

  • Yeah, I've been calling "template-queries" (individual queries within creative mode) vs "sub-queries" (queries to the APIs for each hop within an individual query).

Regarding limits, I've been assuming:

  • templates are run one-at-a-time in order
  • Hmmm, maybe limit of 1000 total results (unique drugs) == X the first number. This means stopping execution / merging results sets once 1000 total results is reached.
    • All templates are designed so that individual results within them == unique drugs (using is_set parameter to do this).
    • why 1000? Well...
      • ARAX limits to the "first 1000 results" in the array
      • based on snippets of what the UI team has said, I think they don't expect to handle much more than 1000 themselves...
      • when glancing on my analysis, there seems to be an uneven distribution of results size (many are <500 or >1000).
  • Y makes sense to me... How about 200 (aka a small query is okay, but >20% more is too much)?
  • another limit of 5 min per template for execution (includes edge-managing / results assembly). Maybe even less? I'm thinking that creative-mode ideally should return a response in < 30 min ideally...

creativity:

  • I've been relying on @andrewsu for the priorities ("what is creative enough", "how to balance creativity vs other important requirements like not crashing the server or taking 20 min to answer 1 thing")
  • However, on a practical note, I think speed/small result sets (aka 1000 total results) is key....I imagine how this will be used very soon, when multiple users are hitting the UI at once and sending "simple" query-graphs...I think users would just give up if they had to wait > 30 min for each thing they try...

"merging UI incoming query with templates":

  • sounds like we're on the same page there. "Don't do it for categories/predicates, just grab the IDs and qNodeIDs / qEdgeID"

@colleenXu
Copy link
Collaborator

@tokebe also adding a note on what another team is doing to "recognize creative-mode and its qNodes" (Translator Slack) https://ncatstranslator.slack.com/archives/C013Q5TVC87/p1656113934859089?thread_ts=1656022814.576499&cid=C013Q5TVC87

@tokebe
Copy link
Member

tokebe commented Jun 27, 2022

This brings up an interesting point -- currently, I have no special behavior for multiple IDs. If the user/ui supplies multiple IDs, they're all merged in an used in template-queries. Should we require that the user/ui query has only 1 ID?

@tokebe
Copy link
Member

tokebe commented Jun 27, 2022

Slight caveat to the X + Y method -- I can only have that take affect after the first template-query, or else it could return just a little over the proposed 1000 + 200 limit and then nothing gets returned. So there's still no guarantee we're returning 1200 or fewer results in the event the first template gets more, just a decent likelihood...

@colleenXu
Copy link
Collaborator

Noting some old UI discussion of creative-mode in internal Translator Slack: https://ncatstranslator.slack.com/archives/C02PG1W7HD0/p1651610305586929

@colleenXu
Copy link
Collaborator

recording the "Chris Bizon" provided incoming/creative-mode query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "disease": {
                    "ids": ["MONDO:0005147"]
                },
                "chemical": {
                    "categories": ["biolink:ChemicalEntity"]
                }
            },
            "edges": {
                "t_edge": {
                    "object": "disease",
                    "subject": "chemical",
                    "predicates": ["biolink:treats"],
                    "knowledge_type": "inferred"
                }
            }
        }
    }
}

@colleenXu
Copy link
Collaborator

colleenXu commented Jun 29, 2022

Also looks like we decided not to run the literal edge. We previously chose to run the literal edge, but I couldn't find a documented reason why we decided not to.

I think this is fine. We basically run the "literal edge" with the first template.


And we decided not to do a time-limit on each template-run at the moment (only results capping)

@colleenXu
Copy link
Collaborator

Recording decisions on what the creative-mode incoming query looks like (from UI / ARS):

1 Disease ID, 1 "treats" predicate

What another team was checking:

  • count of qnodes = 2
  • count of qedges = 1
  • qedge knowledge type = inferred
  • qedge subject qnode category in (drug, small molecule, chemical entity)
  • qedge object qnode category = biolink:Disease
  • count of qedge object ids = 1

@colleenXu
Copy link
Collaborator

From the 8/10 lab meeting on Translator stuff (added feature):

(Jackson's plate of work) "should be technically feasible to "top off" results and trim resulting KG"

  • Once cap is reached (>1500 if we added the incoming-template results to the current set of result)...add the incoming template-results and allow result merging
  • Then cut off the bottom of the results to reach the cap (1500) and trim the KG to match

Also edit the TRAPI logs to make the cap clearer: "the cap of 1500 is reached because the template just run has 3000 results"

@andrewsu
Copy link
Member Author

MVP is complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants