Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Practical challenges with ARA and KP identification in the Translator Resources SmartAPI Registry #18

Closed
RichardBruskiewich opened this issue Jul 22, 2022 · 8 comments

Comments

@RichardBruskiewich
Copy link
Contributor

Full SRI Testing relies on unambiguous mapping of KP resources called onto the ARA resources which use them, using the Translator SmartAPI Registry. Unfortunately, although KP (and ARA) SmartAPI Registry entries may be distinguished from one another in multiple ways, there does not yet seem to be a single option without limitations, for the fully automated programmatic identification of KP entries by ARA's. The following metadata are available to resolve this objective:

  • The (KP/ARA) entry's SmartAPI Registration Identifier: this identifier is a true "primary key" within the Registry hence fully discriminates entries from one another; however, the Registration Identifier is semantically opaque to human readers. Curators would have to exercise extreme care in looking up and transcribing such identifiers where they are needed (e.g. KP registry ID's in ARA test data templates). Use of the Registration Identifier in the ARA test data template may also not necessarily be informative and stable with respect to KP version and maturity changes.
  • KP-endpoint-as-URI: each KP has a resolvable URL which, again, globally identifies a resource (mechanistically) and is slightly more transparent than the SmartAPI Registration ID, simplifying human curation in the ARA test data template. The URL could be used to search the SmartAPI Registry for an associated SmartAPI entry. In principle, there should be a one-to-one mapping of this endpoint URL to SmartAPI entry (but is this guaranteed?).
  • info.title: this property is totally human-readable, but there is no guarantee for uniqueness of a title among Registry entries; the title's format is currently totally unconstrained. Resource versioning and roles can be reflected in the title (e.g. Some KP - TRAPI 1.2 - Dev) but this possibility is not formally applied across all Translator resources at present. Also, a practical issue is that the title string could generally be rather verbose, with the possibility of typos in transcription (if not carefully copied and pasted, verbatim).
  • info.x-trapi.infores: the moderately human-readable value of this schema extension property has a moderate community guarantees of global uniqueness to discriminate between knowledge resources; however, it is not guaranteed to be unique within the Translator SmartAPI Registry across resource versions (TRAPI 1.0/2/3, etc.) and roles (dev/prod, etc.).
  • info.x-trapi.team: the moderately human-readable value of this schema extension property has a moderate community guarantees of global uniqueness to discriminate between the teams producing unique knowledge resources, but given that some teams create/manage multiple (KP/ARA) knowledge resources, is not globally unique across Translator SmartAPI Registry entries (however, perhaps the team property could help to index SRI Testing test data templates, as it did under the original repository in situ test_triples folder)
  • aggregate x-translator/x-trapi metadata: it is conceivable that unambiguous discernment of KP resources may be achieve through construction of a kind of "composite key" of SmartAPI core and/or Translator extension metadata. This promises to be substantially human interpretable and closer to substantially unique across entries within the Translator SmartAPI Registry, but implies the curation of verbose ARA KP metadata specifications under the KPs list of the ARA test data template.
  • The BioThings Explorer (BTE) Challenge: this last bullet point here is not so much a possible solution/option for identification of KPs, as an additional complication. Quoting @colleenXu from a Slack conversation on the topic:

I recall one issue regarding the server urls /KPs. There are plenty of KPs that are assessable only through BTE. aka they don't have their own registration. That's one reason why a purely registration or infores based approach wouldn't work.

So the way to access the KPs “as TRAPI” is to go through BTE and use its special endpoints.
...
It looks like previously, Chris Bizon and I handled it by providing the BTE endpoints to access each KP in a TRAPI way. https://github.com/NCATSTranslator/testing/pull/26/files
However, this method may not work well in the stuff you've proposed because these endpoints don't have a separate registration from BTE.
it sounds like you're proposing to just get all the BTE triples (perhaps the ones that are from the KPs it brings in special), without separating those triples by what named KP they come from (a level below BTE)
...
The Service Provider KP registration is a special case of a separate registration, to allow other ARAs like Aragorn to access the entire set of BTE-specific KPs easily. We do not want to register every BTE-specific KP this way, because it's unmanageable (26 different ones right now, and still growing regularly). Instead, we direct people to the config list and tell them to ask questions if they don't know how to access the TRAPI interface for the KPs they want.
Yes, BTE is an ARA but it also acts as a "TRAPI interface" for KPs that aren't TRAPI. At the moment, these two aspects are very intertwined and there aren't plans to make them more separate.
For the previous onehop testing, we were under the impression that ideally all KPs under Service Provider would have examples for all meta-triples. Which would be a lot of work to curate and maintain....and it previously wasn't finished.

It is unclear how easy SRI Testing would be for BTE/Service Provider wrapped knowledge resources. Perhaps it is a case of formulating a set of more shallow testing objectives for BTE/Service Provider wrapped knowledge resources (or excluding them from testing?)

Conceptually - the BTE/Service Provider issue aside - the practical challenges of ARA test data mappings onto KP resources has two orthogonal considerations:

  1. Unique and stability of identification of distinct knowledge providers (including issues of versioning and maturity)
  2. Programmatic versus human readability of the ARA test data specification

Both challenges (again, excepting BTE/Service Provider) may be mitigated by appropriate software tools to manage the task. In this spirit, the ARAX SmartAPI metadata display is a good example how Translator SmartAPI Registry metadata could be displayed to human users, allowing human curators to pick and choose specific KP instances (endpoints) for automatic curation into an ARA test data specification (using whichever of the KP identification options above we choose to implement):
ARAX SmartAPI metadata display

@colleenXu
Copy link
Collaborator

  • Note that TRAPI registrations will likely have multiple server urls for different maturity levels (ITRB prod aka production, ITRB test aka testing, ITRB CI aka staging, non-ITRB dev stuff aka development)
  • a server URL may therefore be the easiest (no need to worry about processing TRAPI registrations to pick a URL)

@RichardBruskiewich
Copy link
Contributor Author

Internet URL's are, in effect, globally unique (hence their use as URI's). SRI Testing still needs to look up the corresponding TRAPI registrations/entries in the Translator SmartAPI Registry corresponding to the KPs, to access other metadata, such as the info.title, info.version, x-trapi.version, x-translator.biolink-version and x-trapi.test_data_location

@RichardBruskiewich
Copy link
Contributor Author

RichardBruskiewich commented Jul 22, 2022

BTW, I've heard that the x-maturity / servers annotation is still under some discussion. I guess this also complicates URL lookup... and also means that the SmartAPI Registration ID is not globally unique across server instances of knowledge resources? At least, one imagines it feasible to resolve URL's to a given TRAPI entry (to get the metadata, as noted above)

@colleenXu
Copy link
Collaborator

I'm not sure what you're referring to. 1 SmartAPI registration may have multiple server urls / instances, corresponding to different maturity levels.

@RichardBruskiewich
Copy link
Contributor Author

...SmartAPI Registration ID is not globally unique across server instances of knowledge resources...

where 'server instances' are exactly the ones you mentioned as corresponding to different maturity levels. We're saying the same thing.

@RichardBruskiewich
Copy link
Contributor Author

The SRI Testing facet of the above issue seems substantially (but not completely... see below) resolved by PR#19 and #20. In this sense, for now, it also seems to be close to being the case that KPs and ARAs are (almost) uniquely indexed by the 2-tuple of their infores and trapi-version, in that the community standard is that only one Translator SmartAPI Registry ("Registry") entry should exist for a given infores of a given info.x-trapi.version. Thus, ARA's can simply refer by infores to the KPs to which they query.

That said, within an entry, it is now apparently that various servers.x-maturity endpoints may diverge with respect to supported Biolink Model version. A separate issue is being opened to discuss and resolve this.

I'm not closing this issue yet, in case there are other unresolved concerns, such as the KP use case for plurality of BTE wrapped KPs (sitting behind one BTE Registry entry).

@newgene
Copy link
Collaborator

newgene commented Jan 31, 2023

@RichardBruskiewich given issue #21, let's close this one unless you have other specific concerns to address (in which case, probably better to create a new issue too)?

@RichardBruskiewich
Copy link
Contributor Author

@newgene, I guess we may be ok closing this one.

The above concern "..I'm not closing this issue yet, in case there are other unresolved concerns, such as the KP use case for plurality of BTE wrapped KPs (sitting behind one BTE Registry entry)..." might already be resolved in that BTE would use a list of URL's in its applicable info.x-trapi.test_data_location, pointing to KP test data files which each contain the infores corresponding to the back end third party knowledge source. I believe that we are able to propagate this upwards in the SRI Testing harness.

I'm not entirely sure yet whether we'll get completely correct behaviour during testing simply because I think that the test data is merged into one set of test edges, and all of the edges sent to BTE. I'm not sure what BTE does to filter out queries for back end knowledge sources to ensure that the test data is not sent to knowledge sources that wouldn't be able to use it (maybe it's a moot point... depends on how BTE resolves queries,... I guess that the subject and object categories (plus predicates) are used to limit queries in this manner...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants