Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Should we support vendor-specific cloud API libraries in BinderHub? #1623

Open
manics opened this issue Jan 14, 2023 · 5 comments

Comments

@manics
Copy link
Member

manics commented Jan 14, 2023

There's a few use-cases that benefit from using a cloud specific library to make API calls, e.g. using AWS boto3 to create an ECR container repository and to obtain a temporary read/write token #1055

Other public cloud registries may benefit from similar e.g. Oracle Cloud Infrastructure Registry (there's an autocreate option when pushing new images, but creating the repository in advance allows more control of things like auto-deletion), which requires the oci library.

There's probably others, either related to registries, or for other things like hooking into cloud notifications.

It's easy to have extras_requires in setup.py, or to put the new Registry (for example) implementation in a separate Python package since it's configurable with Traitlets, but what should we include in the container image? Just the ones used by mybinder.org and encourage everyone else to re-build their BinderHub container? Or should we include all of them? Do we take a completely different route and make those vendor specific API calls via a separate container (going down the microservices route)?

@betatim
Copy link
Member

betatim commented Jan 16, 2023

The philosophy of BinderHub so far has been to be "vendor agnostic". I think most often this leads to/is interpreted as "lowest common denominator", use the stuff that works equally well (or badly) everywhere.

I'm not familiar with "ECR container repository". I quickly googled it and it suggested "container registry" to me. Setting up a container registry sounds like a one time/setup task, not an ongoing thing that BinderHub does while it is running. Could you explain a bit what you had in mind? For one time setup stuff I think we should describe it in the guide(s). The vendor specific guides are a good example of how they are valuable but also often out of date (which I think is the relevant thing for deciding about "vendor specific" code as well).

Over the last year or so I've become more and more convinced of (and attracted to) the idea that having a plugin system is a great idea. In this case BinderHub would allow plugins to change/augment/extend parts of its behaviour. The advantage of having a plugin system is that anyone (including core maintainers) can extend BinderHub without needing to consider all the things/permissions/consensus of including it in the core. I think it also allows for a lot of creativity, and some kind of "combinatorial explosion" of things your software can do (think iPhone w/o app store (no plugin system) vs iPhone with app store (plugin system)). Maybe something like you have in mind would be a good use case of a plugin system?

Of course creating the "host side" of a plugin system is work and the quality of plugins rises and falls with how well it is done. JupyterHub already has a kinda plugin system for spawners and authenticators, so there is precedent for this working well.

I think a plugin system would imply that you need to make your own binderhub image?!

@minrk
Copy link
Member

minrk commented Jan 16, 2023

I think you're right that a lot of the great interface-defining work @manics and others have done is getting BinderHub for a level of maturity where it defines the interfaces, and implementations of non-default providers start moving to their own packages. But once you start breaking things up like that, it also starts to make sense to be doing more versioned releases to better communicate changes and compatibility at the API level.

I think a plugin system would imply that you need to make your own binderhub image?!

Yes and no - we see this in z2jh: z2jh's default image ships with a common set of plugins (then are they really plugins?), but you can always add more / select versions in a custom image. We still have to decide what's in this default set and what's not, which is a pretty difficult line to draw as everyone asks for their Authenticator to be added so they don't need a custom image.

I know a lot of supply chain folks bristle at the idea of install-at-runtime as a pattern, but I honestly think for plugin purposes that pip install at runtime is a hugely practical way to make small changes to an image without needing to build, host, and maintain a mostly duplicate image.

@manics
Copy link
Member Author

manics commented Jan 27, 2023

I'm not familiar with "ECR container repository". I quickly googled it and it suggested "container registry" to me. Setting up a container registry sounds like a one time/setup task, not an ongoing thing that BinderHub does while it is running.

@betatim ECR (and some other container registries) don't support pushing to registry.example.org/account/new-repository-that-doesnt-exist, instead you need to create the repository using a vendor-specific API call, then you can push to registry.example.org/account/new-repository-that-doesnt-exist/<any-image-tags>. ECR has an additional complication that the registry login token is temporary and should be renewed at regular intervals, which requires another AWS API call.

I've had a go at implementing the microservice model with Oracle Cloud's registry:
https://github.com/manics/oracle-container-repositories-svc

Example binderhub config stanza:

import json
from tornado import httpclient
from traitlets import Unicode
from binderhub.registry import DockerRegistry


class ExternalRegistryHelper(DockerRegistry):

    service_url = Unicode(
        "http://oracle-container-repositories-svc:8080",
        allow_none=False,
        help="The URL of the registry helper micro-service.",
        config=True,
    )

    auth_token = Unicode(
        "secret-token",
        help="The auth token to use when accessing the registry helper micro-service.",
        config=True,
    )

    async def get_image_manifest(self, image, tag):
        """
        If the container repository exists use the standard Docker Registry API
        to check for the image tag.
        Otherwise create the container repository.

        The full registry image URL has the form:
        CONTAINER_REGISTRY/OCIR_NAMESPACE/OCIR_IMAGE_NAME:TAG
        but the BinderHub image is OCIR_NAMESPACE/OCIR_IMAGE_NAME
        so we need to remove the OCIR_NAMESPACE component
        """
        client = httpclient.AsyncHTTPClient()
        image = image.split("/", 1)[1]
        repo_url = f"{self.service_url}/repo/{image}"
        headers = {"Authorization": f"Bearer {self.auth_token}"}

        self.log.debug(f"Checking whether repository exists: {repo_url}")
        try:
            repo = await client.fetch(repo_url, headers=headers)
            repo_exists = True
        except httpclient.HTTPError as e:
            if e.code == 404:
                repo_exists = False
            else:
                raise

        if repo_exists:
            repo_json = json.loads(repo.body.decode("utf-8"))
            self.log.debug(f"Repository exists: {repo_json}")
            return await super().get_image_manifest(image, tag)
        else:
            self.log.debug(f"Creating repository: {repo_url}")
            await client.fetch(repo_url, headers=headers, method="POST", body="")
            return None


c.BinderHub.registry_class = ExternalRegistryHelper

This only requires standard HTTP GET/POST calls and headers, the complex Oracle Cloud auth and API calls are hidden in the microservice.

@betatim
Copy link
Member

betatim commented Feb 3, 2023

That looks nice. All the vendor specific stuff is in one place, and the way to extend BinderHub is also not too ugly. A downside is that creating what you created requires quite a bit of knowledge of how BinderHub works, so it is probably beyond the average user's skills. Can/should we bundle the microservice in BinderHub's repo to make it (and others like it) more discoverable? Have a repo tag that is used to create a list in the docs?

@manics
Copy link
Member Author

manics commented Feb 5, 2023

I'm going to see if I can get ECR actually working. If it does then I think incorporating some of the work into BinderHub will be helpful to admins:

  • At a minimum, adding a Configurable class ExternalRegistryHelper(DockerRegistry) to BinderHub means no in-line extraConfig is needed
  • If the Helm chart for deploying the microservice is also added that makes it even easier to deploy. The only configuration needed will be the name of the image to use, and some configuration parameters/environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants