Skip to content

Commit

Permalink
Extend documentation (#303)
Browse files Browse the repository at this point in the history
  • Loading branch information
kate-goldenring authored Apr 16, 2021
1 parent 65d6297 commit c75f865
Show file tree
Hide file tree
Showing 13 changed files with 1,193 additions and 839 deletions.
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Slack channel #akri](https://img.shields.io/badge/slack-akri-blueviolet.svg?logo=slack)](https://kubernetes.slack.com/messages/akri)
[![Rust Version](https://img.shields.io/badge/rustc-1.49.0-blue.svg)](https://blog.rust-lang.org/2020/12/31/Rust-1.49.0.html)
[![Kubernetes Version](https://img.shields.io/badge/kubernetes-≥%201.16-blue.svg)](https://v1-16.docs.kubernetes.io/)
[![Kubernetes Version](https://img.shields.io/badge/kubernetes-≥%201.16-blue.svg)](https://kubernetes.io/)
[![codecov](https://codecov.io/gh/deislabs/akri/branch/main/graph/badge.svg?token=V468HO7CDE)](https://codecov.io/gh/deislabs/akri)

[![Check Rust](https://github.com/deislabs/akri/workflows/Check%20Rust/badge.svg?branch=main&event=push)](https://github.com/deislabs/akri/actions?query=workflow%3A%22Check+Rust%22)
Expand All @@ -22,28 +22,29 @@ Simply put: you name it, Akri finds it, you use it.
## Why Akri
At the edge, there are a variety of sensors, controllers, and MCU class devices that are producing data and performing actions. For Kubernetes to be a viable edge computing solution, these heterogeneous “leaf devices” need to be easily utilized by Kubernetes clusters. However, many of these leaf devices are too small to run Kubernetes themselves. Akri is an open source project that exposes these leaf devices as resources in a Kubernetes cluster. It leverages and extends the Kubernetes [device plugin framework](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/), which was created with the cloud in mind and focuses on advertising static resources such as GPUs and other system hardware. Akri took this framework and applied it to the edge, where there is a diverse set of leaf devices with unique communication protocols and intermittent availability.

Akri is made for the edge, **handling the dynamic appearance and disappearance of leaf devices**. Akri provides an abstraction layer similar to [CNI](https://github.com/containernetworking/cni), but instead of abstracting the underlying network details, it is removing the work of finding, utilizing, and monitoring the availability of the leaf device. An operator simply has to apply a Akri Configuration to a cluster, specifying the discovery protocol (say ONVIF) and the pod that should be deployed upon discovery (say a video frame server). Then, Akri does the rest. An operator can also allow multiple nodes to utilize a leaf device, thereby **providing high availability** in the case where a node goes offline. Furthermore, Akri will automatically create a Kubernetes service for each type of leaf device (or Akri Configuration), removing the need for an application to track the state of pods or nodes.
Akri is made for the edge, **handling the dynamic appearance and disappearance of leaf devices**. Akri provides an abstraction layer similar to [CNI](https://github.com/containernetworking/cni), but instead of abstracting the underlying network details, it is removing the work of finding, utilizing, and monitoring the availability of the leaf device. An operator simply has to apply a Akri Configuration to a cluster, specifying the Discovery Handler (say ONVIF) that should be used to discover the devices and the Pod that should be deployed upon discovery (say a video frame server). Then, Akri does the rest. An operator can also allow multiple nodes to utilize a leaf device, thereby **providing high availability** in the case where a node goes offline. Furthermore, Akri will automatically create a Kubernetes service for each type of leaf device (or Akri Configuration), removing the need for an application to track the state of pods or nodes.

Most importantly, Akri **was built to be extensible**. We currently have ONVIF, udev, and OPC UA discovery handlers, but more can be easily added by community members like you. The more protocols Akri can support, the wider an array of leaf devices Akri can discover. We are excited to work with you to build a more connected edge.
Most importantly, Akri **was built to be extensible**. Akri currently supports ONVIF, udev, and OPC UA Discovery Handlers, but more can be easily added by community members like you. The more protocols Akri can support, the wider an array of leaf devices Akri can discover. We are excited to work with you to build a more connected edge.

## How Akri Works
Akri’s architecture is made up of four key components: two custom resources, a device plugin implementation, and a custom controller. The first custom resource, the Akri Configuration, is where **you name it**. This tells Akri what kind of device it should look for. At this point, **Akri finds it**! Akri's device plugin implementation looks for the device and tracks its availability using Akri's second custom resource, the Akri Instance. Having found your device, the Akri Controller helps **you use it**. It sees each Akri Instance (which represents a leaf device) and deploys a ("broker") pod that knows how to connect to the resource and utilize it.
Akri’s architecture is made up of five key components: two custom resources, Discovery Handlers, an Agent (device plugin implementation), and a custom Controller. The first custom resource, the Akri Configuration, is where **you name it**. This tells Akri what kind of device it should look for. At this point, **Akri finds it**! Akri's Discovery Handlers look for the device and inform the Agent of discovered devices. The Agent then creates Akri's second custom resource, the Akri Instance, to track the availability and usage of the device. Having found your device, the Akri Controller helps **you use it**. It sees each Akri Instance (which represents a leaf device) and deploys a ("broker") Pod that knows how to connect to the resource and utilize it.

<img src="./docs/media/akri-architecture.svg" alt="Akri ONVIF Flow" style="padding-bottom: 10px padding-top: 10px;
<img src="./docs/media/akri-architecture.svg" alt="Akri Architecture" style="padding-bottom: 10px padding-top: 10px;
margin-right: auto; display: block; margin-left: auto;"/>

## Quick Start with a Demo
Try the [end to end demo](./docs/end-to-end-demo.md) of Akri to see Akri discover mock video cameras and a streaming app display the footage from those cameras. It includes instructions on K8s cluster setup. If you would like to perform the demo on a cluster of Raspberry Pi 4's, see the [Raspberry Pi 4 demo](./docs/end-to-end-demo-rpi4.md).

## Documentation
- [Running Akri using our currently supported protocols](./docs/user-guide.md)
- [User guide for deploying Akri using Helm](./docs/user-guide.md)
- [Akri architecture in depth](./docs/architecture.md)
- [How to build Akri](./docs/development.md)
- [How to extend Akri for protocols that haven't been supported yet](./docs/extensibility.md).
- Proposals for enhancements such as new protocol implementations can be found in the [proposals folder](./docs/proposals)
- [How to extend Akri for protocols that haven't been supported yet](./docs/discovery-handler-development.md).
- [How to create a broker to leverage discovered devices](./docs/broker-development.md).
- Proposals for enhancements such as new Discovery Handler implementations can be found in the [proposals folder](./docs/proposals)

## Roadmap
Akri was built to be extensible. We currently have ONVIF, udev, OPC UA discovery handlers, but as a community, we hope to continuously support more protocols. We have created a [discovery handler implementation roadmap](./docs/roadmap.md#implement-additional-discovery-handlers) in order to prioritize development of discovery handlers. If there is a protocol you feel we should prioritize, please [create an issue](https://github.com/deislabs/akri/issues/new/choose), or better yet, contribute the implementation! We are excited to work with you to build a more connected edge.
Akri was built to be extensible. We currently have ONVIF, udev, OPC UA Discovery Handlers, but as a community, we hope to continuously support more protocols. We have created a [Discovery Handler implementation roadmap](./docs/roadmap.md#implement-additional-discovery-handlers) in order to prioritize development of Discovery Handlers. If there is a protocol you feel we should prioritize, please [create an issue](https://github.com/deislabs/akri/issues/new/choose), or better yet, contribute the implementation!

## Contributing
This project welcomes contributions, whether by [creating new issues](https://github.com/deislabs/akri/issues/new/choose) or pull requests. See our [contributing document](./docs/contributing.md) on how to get started.
Expand Down
27 changes: 24 additions & 3 deletions docs/agent-in-depth.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,27 @@ To enable resource sharing, the Akri Agent creates and updates the `Instance.dev
For more detailed information, see the [in-depth resource sharing doc](./resource-sharing-in-depth.md).

## Resource discovery
The Agent discovers resources via Discovery Handlers (DHs). A Discovery Handler is anything that implements the `DiscoveryHandler` service defined in [`discovery.proto`](../discovery-utils/proto/discovery.proto). In order to be utilized, a DH must register with the Agent, which hosts the `Registration` service defined in [`discovery.proto`](../discovery-utils/proto/discovery.proto). The Agent maintains a list of registered DHs and their connectivity status, which is either `Waiting`, `Active`, or `Offline(Instant)`. When registered, a DH's status is `Waiting`. Once the Agent has successfully created a connecting with a DH, due a Configuration requesting resources discovered by that DH, it's status is set to `Active`. If the Agent is unable to connect or loses a connection with a DH, its status is set to `Offline(Instant)`. The `Instant` marks the time at which the DH became unresponsive. If the DH has been offline for more than 5 minutes, it is removed from the Agent's list of registered discovery handlers. If a Configuration is deleted, the Agent drops the connection it made with all DHs for that Configuration and marks the DHs' statuses as `Waiting`. Note, while probably not commonplace, the Agent allows for multiple DHs to be registered for the same protocol. IE: you could have two udev DHs running on a node on different sockets.

Supported DHs each have a [library](../discovery-handlers) and a [binary implementation](../discovery-handler-modules). This allows them to either be run within the Agent binary or in their own Pod.
The Agent discovers resources via Discovery Handlers (DHs). A Discovery Handler is anything that implements the
`DiscoveryHandler` service defined in [`discovery.proto`](../discovery-utils/proto/discovery.proto). In order to be
utilized, a DH must register with the Agent, which hosts the `Registration` service defined in
[`discovery.proto`](../discovery-utils/proto/discovery.proto). The Agent maintains a list of registered DHs and their
connectivity statuses, which is either `Waiting`, `Active`, or `Offline(Instant)`. When registered, a DH's status is
`Waiting`. Once a Configuration requesting resources discovered by a DH is applied to the Akri-enabled cluster, the
Agent will create a connection with the DH requested in the Configuration and set the status of the DH to `Active`. If
the Agent is unable to connect or loses a connection with a DH, its status is set to `Offline(Instant)`. The `Instant`
marks the time at which the DH became unresponsive. If the DH has been offline for more than 5 minutes, it is removed
from the Agent's list of registered Discovery Handlers. If a Configuration is deleted, the Agent drops the connection it
made with all DHs for that Configuration and marks the DHs' statuses as `Waiting`. Note, while probably not commonplace,
the Agent allows for multiple DHs to be registered for the same protocol. IE: you could have two udev DHs running on a
node on different sockets.

The Agent's registration service defaults to running on the socket `/var/lib/akri/agent-registration.sock` but can be
Configured with Helm. While Discovery Handlers must register with this service over UDS, the Discovery Handler's service
can run over UDS or an IP based endpoint.

Supported Rust DHs each have a [library](../discovery-handlers) and a [binary
implementation](../discovery-handler-modules). This allows them to either be run within the Agent binary or in their own
Pod.

Reference the [Discovery Handler development document](./discovery-handler-development.md) to learn how to implement a
Discovery Handler.
4 changes: 2 additions & 2 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ For a more in-depth understanding, see [Controller In-depth](./controller-in-dep
# ...
capacity: 3
```
1. The Akri Agent sees the Configuration and discovers a leaf device using the protocol specified in the Configuration. It creates a device plugin for that leaf device and registers it with the kubelet. The Agent then creates an Instance for the discovered leaf device, listing itself as a node that can access it under `nodes`. The Akri Agent puts all the information that the broker pods will need in order to connect to the specific device under the `brokerProperties` section of the Instance. Later, the controller will mount these as environment variables in the broker pods. Note how Instance has 3 available `deviceUsage` slots, since capacity was set to 3 and no brokers have been scheduled to the leaf device yet.
1. The Akri Agent sees the Configuration and discovers a leaf device using the protocol specified in the Configuration. It creates a device plugin for that leaf device and registers it with the kubelet. When creating the device plugin, it tells the kubelet to set connection information for that specific device and additional metadata from a Configuration's `brokerProperties` as environment variables in all Pods that request this device's resource. This information is also set in the `brokerProperties` section of the Instance the Agent creates to represent the discovered leaf device. In the Instance, the Agent also lists itself as a node that can access the device under `nodes`. Note how Instance has 3 available `deviceUsage` slots, since capacity was set to 3 and no brokers have been scheduled to the leaf device yet.
```yaml
kind: Instance
metadata:
Expand Down Expand Up @@ -115,7 +115,7 @@ For a more in-depth understanding, see [Controller In-depth](./controller-in-dep
# ...
phase: Pending
```
1. The kubelet on the selected node sees the scheduled pod and resource limit. It checks to see if the resource is available by calling `allocate` on the device plugin running in the Agent for the requested leaf device. When calling `allocate`, the kubelet requests a specific `deviceUsage` slot. Let's say the kubelet requested `akri-<protocolA>-<hash>-1`. The leaf device's device plugin checks to see that the requested `deviceUsage` slot has not been taken by another node. If it is available, it reserves that `deviceUsage` slot for this node (as shown below) and returns true.
1. The kubelet on the selected node sees the scheduled pod and resource limit. It checks to see if the resource is available by calling `allocate` on the device plugin running in the Agent for the requested leaf device. When calling `allocate`, the kubelet requests a specific `deviceUsage` slot. Let's say the kubelet requested `akri-<protocolA>-<hash>-1`. The leaf device's device plugin checks to see that the requested `deviceUsage` slot has not been taken by another node. If it is available, it reserves that `deviceUsage` slot for this node (as shown below) and returns true. In the `allocate` response, the Agent also tells kubelet to mount the `Instance.brokerProperties` as environment variables in the broker Pod.
```yaml
kind: Instance
metadata:
Expand Down
Loading

0 comments on commit c75f865

Please sign in to comment.