Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting heterogeneous Biolink Model versions across x-maturity environments. #21

Open
RichardBruskiewich opened this issue Jan 25, 2023 · 11 comments

Comments

@RichardBruskiewich
Copy link
Contributor

RichardBruskiewich commented Jan 25, 2023

VOTING ON THIS ISSUE

31 January 2023 Translator Architecture Committee call briefly discussed this issue and converged on either of two options, as summarized by Chunlei below.

Please vote using the indicated emojis 🎉 (option 3) or 🚀 (option 4b) or👎(thumbs down) if you don't like either option (please example in an attached comment).

  • 🎉 (Party Popper) (@newgene modified version of) Option 3: Record the Biolink Model version of an x-maturity environment as a x-translator.biolink-version property value alongside the x-maturity property within the OpenAPI servers block.
servers:
  - description: ARAX TRAPI 1.3 endpoint - production
    url: https://arax.transltr.io/api/arax/v1.3
    x-maturity: production
    x-translator:
      - biolink-version: 2.2.11
  - description: ARAX TRAPI 1.3 endpoint - development #1
    url: https://arax.ncats.io/test/api/arax/v1.3/dev_1
    x-maturity: development
    x-translator:
      - biolink-version: 3.0.3
  - description: ARAX TRAPI 1.3 endpoint - development server #2
    url: https://arax.ncats.io/test/api/arax/v1.3/dev_2
    x-maturity: development
    x-translator:
      - biolink-version: 3.1.1
  • 🚀 (Rocket Ship) Option 4b: record the applicable Biolink version in an x-maturity indexed JSON 'object' specification as a value directly under the existing info.x-translator.biolink-version property. There could also be a default value here too, or the legacy simple string (which would would continue to apply to all x-maturity endpoints)
info:
 x-translator: 
   biolink-version:
       default: "2.4.8"
       staging: "3.1.0"
       development: "3.1.1"

Note for this option, the original simple string property value will also still work (applying to all x-maturity environments in the entry), namely:

info:
 x-translator: 
   biolink-version: "3.1.1"

Note that the (modified) option 3) allows for finer granularity in server endpoint tagging with Biolink Model release whereas option 4b), although simpler, enforces only one Biolink Model release per x-maturity environment. That said, the general Translator policy seems to be that all servers of a given x-maturity environment are meant to be interchangeable, thus this granularity is NOT needed?

Once again, please vote using the indicated emojis 🎉 (option 3) or 🚀 (option 4b) or👎(thumbs down) if you don't like either option (please example in an attached comment)..

Original Issue Overview

As Translator evolves, various x-maturity environment (server endpoints) within most KP or ARA appear to be updated asynchronously with respect to Biolink Model versions.

For example, as of January 2023, the production environments may still, represent stable releases (from the fall 2022) of Biolink Model release 2.4.8, whereas, emerging development and testing (CI) endpoints may already be compliant with Biolink Model 3.1.1. Even if this is only a periodic transient issue, it does introduce ambiguity with respect to use of Translator components, for example, with respect to global testing of compliance of those resources to standards (e.g. with SRI Testing).

How can Translator SmartAPI Registry ("Registry") entries (or rather, the Translator community) manage this Biolink Model release heterogeneity across x-maturity environments?

@brettasmi
Copy link

The OpenAPI server object supports additional variables that could be used to specify the Biolink version of that x-maturity environment.

@RichardBruskiewich
Copy link
Contributor Author

RichardBruskiewich commented Jan 25, 2023

Here, I will briefly brainstorm on this issues and options that come to mind for its resolution.

  1. We could simply start by asking whether or not multiple Biolink Model releases should be allowed (by Translator architecture policy) within a given Registry or rather, should every x-maturity environment within a given particular (KP or ARA) Registry entry be expected to be homogeneous with respect to Biolink Model releases? It is already community policy (right @edeutsch, @cbizon?) that Registry entries (hence, all of their deployed x-maturity environments?) are already expected to be globally unique with respect to indexing by the 2-tuple of infores and info.x-trapi.version. One solution to Biolink Model release heterogeneity across x-maturity environments may be to extend the policy of global uniqueness to index by 3-tuple including the info.x-translator.biolink-version property value.

    • Pro: doesn't change the Registry schema design, only constrains the way it is applied, in that a new Biolink Model release of a given (KP or ARA) resource would necessitate creation of a new Registry entry, with the same infores and info.x-trapi.version as the previous entry but with the new info.x-translator.biolink-version property value.
    • Con: Biolink Model releases evolve somewhat faster than TRAPI resource updates and most of the other Registry metadata is likely unchanged, so creating a new entry for each Biolink version release seems tedious and redundant, perhaps resulting in a confusing proliferation of the entries.
  2. The info.x-translator.biolink-version property value is currently a static SmartAPI OpenAPI 'extension' value. Maybe it should not be statically reported? Perhaps reporting it dynamically through the /meta_knowledge_graph TRAPI endpoint is a better semantic home for this metadata?

    • Pro: it is arguable that the Biolink Model version is actually not really a property of 'Translator' (for publication under info.x-translator) but rather, an intrinsic metadata of the knowledge itself (about which we report various other Biolink model characteristics in the /meta_knowledge_graph). The dynamic nature of this option will better support rapid evolution of versioning of underlying data sources.
    • Con: the TRAPI specification would need quick revision (and community implementation) to accommodate this proposed change (albeit, one time pain?). The Biolink Model version would conceivably be less visible at the Registry level (if this is a concern)
  3. Another option is to record the Biolink Model version of an x-maturity environment as a new x-biolink-version property value alongside the x-maturity property in the OpenAPI servers block.

    • Pro: direct association with the endpoint URL directly associated with its appropriate x-maturity tag.
    • Con: the servers block is an OpenAPI standard block; the x-maturity tag there is a SmartAPI standard. Creation of a new x-biolink-version is too application specific (i.e. out of scope of the OpenAPI nor SmartAPI standards).
  4. The info.x-translator.biolink-version property value could be retained as a global default for the Registry entry but an mechanism to override it elsewhere in the Registry entry could be specified and implemented. There are a few possible locations for this:

    4 a) the current (recently updated) design of the info.x-trapi.test_data_location property value includes for x-maturity JSON object entries, so perhaps, a way forward on this issue is to allow for overriding of the top-level Biolink release here (alongside the 'url' tag of the test_data_location URL entry?)

    • Pro: the test_data_location schema is easily extended to include a biolink-version key for this purpose, and people could just add their overriding Biolink Model release values under the appropriate x-maturity tag.
    • Con: the Biolink Model is not strictly speaking just a testing concern. Maybe it is not the best place to record this metadata?

    4 b) instead of the info.x-trapi.test_data_location, we could create an analogous x-maturity indexed JSON 'object' specification as a value directly under the info.x-translator.biolink-version property value. There could also be a 'default' value here too.

    • Pro: this option has somewhat clear metadata semantic scoping and simply extends the existing info.x-translator.biolink-version property (the legacy simple single SemVer string could still be accepted).
    • Con: duplication of x-maturity environment contexts between info.x-translator.biolink-version and info.x-trapi.test_data_location?

If adopted, an instance of the latter 4b) specification would look like something like the following:

  info:
    x-translator: 
      biolink-version:
          default: "2.4.8"
          staging: "3.1.0"
          development: "3.1.1"
  1. For completeness, it could perhaps be remarked that Biolink itself is not really Translator project specific. Upgrading the Biolink version to be a first class citizen of the info block might also be worth pondering. That is:
  info:
    x-translator: 
            infores: "infores:biolink-api"
            component: KP
            team:
            - "Service Provider"
     x-biolink-version:
          default: "2.4.8"
          staging: "3.1.0"
          development: "3.1.1"
     x-trapi:
        etc...

@edeutsch
Copy link

It seems infeasible to me to have everyone on the same Biolink release. First, thus far we have only stipulated second digits (3.0, 3.1) whereas Biolink had 3 digits that different agents have implemented, and then also the goals seem to change pretty frequently. We deployed 3.0.3 only to find out later that at some point we decided 3.1 but didn't update the tracking sheet. Getting everyone on the same version with 99% reliability will be a big lift and will require a lot more planning and communication to make it happen.

@cbizon
Copy link

cbizon commented Jan 26, 2023

Yeah I agree with @edeutsch . @RichardBruskiewich what actual problem is it that we're trying to solve?

@colleenXu
Copy link
Collaborator

Note that staging = ITRB CI (testing is ITRB test)

@RichardBruskiewich
Copy link
Contributor Author

@edeutsch @cbizon I have to laugh a little bit here, but the problem is exactly what @edeutsch says, that "...it seems infeasible to me to have everyone on the same Biolink release...".

I just realized that my wording of option 1 above was a bit confusing, though: I am not advocating that all KP and ARA's only implement one Biolink Model version at all cost. Rather, I'm just asking how to consistently report what release they are implementing, across all the endpoints they expose (I tried to clarify option 1 text accordingly).

That is, I'm actually stating that we face a more stringent challenge, that within a given KP or ARA, it seems to be infeasible to expect all x-maturity environments to be using the same Biolink Model release at a given time. The current reality is that the resources don't always implement the same Biolink Version across all their x-maturity environment (endpoints), let alone, the info.x-translator.biolink-version indicated value. Otherwise put, a given KP may have production implemented using the info.x-translator.biolink-version (e.g. 2.4.8) but may now publishing Biolink Model release 3.1.1 in 'development' or 'testing' x-maturity environments.

In terms of SRI Testing of KP and ARA components for compliance to the Biolink Model, it is important to know which Biolink Model release is being tested, to giving non-spurious validation results; otherwise, the testing with test data that assumes other releases of the Biolink model may report errors which are not relevant.

Maybe this issue is based on a false assumption for testing: that SRI Testing needs to know the Biolink Model release of the target KP or ARA. At the moment, if the Biolink Model version is not stated as a test run parameter, then code assumes that the Translator SmartAPI Registry info.x-translator.biolink-version property value is the testing default.

Perhaps the target Biolink Model release for testing ought to just be a mandatory (CLI) test run specified parameter, such that the actual Biolink Model version implemented by the target KP or ARA, or reported by the info.x-translator.biolink-version is simply ignored; rather, we simply decide that "we want to test 3.1.1. compliance" and say so in the CLI. If we don't say it, then SRI Testing CLI complains and stops!

So, @edeutsch @cbizon, this is not about tracking sheet project-level compliance but mainly about accurately publicizing the actual Biolink Model release of every endpoint. For this reason, the various alternatives for refining the reporting of Biolink Model version of endpoints are brainstormed above.

I'm currently suspecting that option 2 - publicizing the implemented Biolink Model release as another reported parameter of the TRAPI /meta_knowledge_graph API endpoint - might be the most logical location for this metadata given its nature: it is a (meta)property of the underlying knowledge graph, n'est ce pas?

However, @edeutsch implementing Option 2 would require a quick revision of TRAPI and earnest collaboration for implementation ASAP (i.e. in TRAPI 1.4). But, would this not be a super easy fix to TRAPI, easily curated by everyone?

@edeutsch
Copy link

edeutsch commented Jan 28, 2023

maybe. I thinking that it would be much easier for everyone to put it in their servers block. This area is not even really strictly part of TRAPI. It is managed by the SmartAPI folks. So if we can convince them to do this:

servers:
- description: ARAX TRAPI 1.3 endpoint - production
  url: https://arax.transltr.io/api/arax/v1.3
  x-maturity: production
  x-biolink-version: 2.2.11
- description: ARAX TRAPI 1.3 endpoint - development
  url: https://arax.ncats.io/test/api/arax/v1.3
  x-maturity: development
  x-biolink-version: 3.0.3

then everyone can be prodded to switch to this completely independently of TRAPI 1.4 (TRAPI 1.4 will take months to roll out)
So if you want it fast, then the above solution is fast (IMO).
Implementing it in an endpoint will be slow and more work.

So I think option 3 is by far the most expedient.

@RichardBruskiewich
Copy link
Contributor Author

RichardBruskiewich commented Jan 28, 2023

Voting for option 3 above, I see... ;-))

There seems to be a trade-off between visibility of the metadata and its maintainability.

Pushing the Biolink Model version metadata into /meta_knowledge_graph seems to bring it closer to the knowledge graph with which it is associated.

Putting in the servers block does makes it more immediately visible (akin to the current info.x-translator.biolink-version), and one supposes that TRAPI schema validation could ensure that it is properly set with some sensible SemVer value... although the discipline of ensuring that such a value remains consistent with the internal knowledge graph version, is still a matter of metadata curation discipline.

@edeutsch
Copy link

Do we want to bring this up at next week's Architecture call? If we do option 2, this would be a TRAPI issue. But if we do option 3, then it is more a SmartAPI issue for Chunlei and his team. Decide at Arch where to send it @cbizon ?

@newgene
Copy link
Collaborator

newgene commented Jan 31, 2023

Either option 3 (with some modification) or 4 would be fine with me, and my notes are below:

  • I still prefer to option 4 like this:

    info:
     x-translator: 
       biolink-version:
           default: "2.4.8"
           staging: "3.1.0"
           development: "3.1.1"
    

    with

    info:
     x-translator: 
       biolink-version: "2.4.8"
    

    still as a back-compatible option

  • But if we decide to go for option 3, I suggest:

    • nested under server.x-translator for translator-specific extension fields
    • move other "x-maturity" specific values under this level as well, so we don't mix two styles
    servers:
      - description: ARAX TRAPI 1.3 endpoint - production
        url: https://arax.transltr.io/api/arax/v1.3
        x-maturity: production
        x-translator:
          - biolink-version: 2.2.11
      - description: ARAX TRAPI 1.3 endpoint - development
        url: https://arax.ncats.io/test/api/arax/v1.3
        x-maturity: development
        x-translator:
          - biolink-version: 3.0.3
    

    With this option, we will always set biolink-version at the "server" level, won't be able to come back to set it at API level when all servers are on the same version.

@RichardBruskiewich
Copy link
Contributor Author

Option 4 would be totally fine with the caveat that all endpoints of a given x-maturity environment will have to comply with one Biolink Model version.

Option 3 might, however, provide for greater granularity, down to individual server endpoints, if that is desirable, since some KPs or ARAs may have more than one server of a given x-maturity environment (e.g. multiple development servers, each implementing a distinct Biolink Model version?). @newgene's proposal to nest tags under an x-translator block in the servers block, sounds sensible too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants