-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document architectural options for Invoker deployment #110
Comments
DockerContainerFactory has the additional weakness that it complicates the security configuration, e.g. consider how you would implement a Calico policy to prevent lambda containers from accessing the control plane. |
As for suspend/resume, it seems unlikely that it will ever be implemented in K8S or Mesos, for reasonably good reasons. Perhaps there is a better way? For example, an expensive container like Java could be pre-warmed in a generic state. Or, if there are artifacts to compile, do it beforehand and save the intermediaries? Or, crazy idea, maybe support process hibernation like https://criu.org/. (I guess suspend/resume needs its own topic.) |
@timboldt It may be possible to use a custom executor with the mesos framework, such that the framework supports a pause/resume message being sent to the executor for a particular task. In our case, we haven't really considered this yet, so it may also be in the crazy idea category. One other aspect of ContainerFactory that we anticipate (in the future) is heterogeneous clusters where actions that require different types of resources will only be scheduled to hosts that meet those requirements e.g. GPU. |
This issue is to document options for deploying the invoker subsystem for OpenWhisk. The topic has been discussed in various venues before, most recently in a review of #107 by @stigsb.
The key choice to make when deploying invokers is what implementation of the ContainerFactoryProvider SPI to use. There are currently two approaches being used by downsteam consumers of this project:
DockerContainerFactory
In this approach, the Kubernetes scheduler is only used to deploy the OpenWhisk "control plane". All of the user action containers are created, managed, and destroyed by the invoker using docker on the Kubernetes worker node. For this approach to work well, it is essential that there is exactly 1 invoker pod per worker node that is intended for user function execution. Using a Daemonset for the invokers is a natural fit, since the nodes intended for the invoker to use will be fairly static and can be labeled accordingly. Capacity is added/removed from the system by adding/removing worker nodes to the cluster and/or adding/removing the invoker label to the worker nodes.
This approach has the advantage of supporting low latency suspend/resume operations, but gives up some of the advantages of running on Kubernetes because it keeps the Kubernetes scheduler in the dark and forces a relatively static allocation of worker nodes to OpenWhisk invokers.
KubernetesContainerFactory
In this approach, the Kubernetes scheduler is used for all container operations: both control plane and user containers are created, managed, and destroyed by Kubernetes. In this approach, it is highly likely that the number of invoker pods will be much smaller than the number of worker nodes in the cluster. Furthermore, it is likely that some form of autoscaling could be applied to dynamically vary the number of invokers to match system load (although #84 is needed to really make autoscaling work well).
This approach allows better sharing of compute resources between OpenWhisk and other uses of the Kubernetes cluster. However, the current KubernetesContainer (https://github.com/projectodd/incubator-openwhisk/blob/d2eb77aac212fb9970f3c9f914bf5863dcbefe50/core/invoker/src/main/scala/whisk/core/containerpool/kubernetes/KubernetesContainer.scala#L105 and https://github.com/projectodd/incubator-openwhisk/blob/d2eb77aac212fb9970f3c9f914bf5863dcbefe50/core/invoker/src/main/scala/whisk/core/containerpool/kubernetes/KubernetesContainer.scala#L108) does not actually implement the suspend/resume actions, so cannot be used if suspension of warm containers is a deployment requirement.
The text was updated successfully, but these errors were encountered: