Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mounting connectivity information for multiple devices/instances in a Pod #492

Closed
kate-goldenring opened this issue Jul 7, 2022 · 9 comments · Fixed by #560 or #561
Closed
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@kate-goldenring
Copy link
Contributor

Background on how Akri adds connectivity information to brokers

The Akri Agent creates a device plugin for each device it discovers, creating a node level resource for that device. Then, a Pod can request that IoT device just as it would a compute resource (CPU and memory) in the resourceRequests section of the PodSpec. This is explained further in Akri's requesting Akri resources documentation. Akri's Controller does this on your behalf when a brokerSpec is specified in a Configuration. It will deploy that Pod or Job, adding the discovered device as a resourceRequest. Then, before the Pod is run on a node, the kubelet calls Allocate on the device plugin in the Akri Agent to confirm that it can use that resource (device). In it's response, the device plugin can specify any environment variables or mounts that should be added to the container using that device. This is where Akri sets connectivity information (and adds devnode mounts for udev). The environment variables are delineated in Akri's broker documentation.

Problem

The issue is that there is one environment variable name for all of a protocol of devices. This has been fine in Akri's prevailing scenario of one device per broker; however, it causes missing connectivity information for the remaining N-1 devices requested.

For example, for ONVIF, the RTSP url for the camera is set under an environment variable named ONVIF_DEVICE_SERVICE_URL, so say a Pod has the following resourceRequests:

  resources:
    limits:
      akri.sh/onvif-camera-id1: "1"
      akri.sh/onvif-camera-id2: "1"
    requests:
      akri.sh/onvif-camera-id1: "1"
      akri.sh/onvif-camera-id2: "1"

When the broker Pod books up, only one (of the two camera's) RTSP URL is set in the environment variable ONVIF_DEVICE_SERVICE_URL. Instead, a list of urls should be specified or there should be N environment variables for N devices, each with an expected prefix, so the broker could do the equivalent of printenv | grep ONVIF_DEVICE_SERVICE_URL* to know which devices it can connect to.

Demo of the problem

The following demonstrates the problem using Akri's debugEcho Discovery Handler which "discovers" a set of devices declared in an Akri Configuration. The name of the discovered device is then mounted in the brokers. This walk through shows how only one device name is mounted when a Pod requests both discovered debugEcho resources.

  1. Install Akri, telling the debugEcho Discovery Handler to discover two fake devices called device1 and device2. Note that we are not specifying a broker Pod as we will manually deploy a Pod that will used the discovered devices.
helm repo add akri-helm-charts https://project-akri.github.io/akri/
helm install akri akri-helm-charts/akri \
    $AKRI_HELM_CRICTL_CONFIGURATION \
    --set agent.allowDebugEcho=true \
    --set debugEcho.discovery.enabled=true \
    --set debugEcho.configuration.enabled=true \
    --set debugEcho.configuration.shared=false \
    --set debugEcho.configuration.discoveryDetails.descriptions[0]="device1" \
    --set debugEcho.configuration.discoveryDetails.descriptions[1]="device2"
  1. Watch as the two devices are discovered and two Akri instances are created: kubectl get pods,akrii,akric. Output would look similar to the following on a single node cluster:
NAME                                              READY   STATUS    RESTARTS   AGE
pod/akri-debug-echo-discovery-daemonset-9rxsz     1/1     Running   0          3m2s
pod/akri-agent-daemonset-6tt76                    1/1     Running   0          3m2s
pod/akri-controller-deployment-79c857f4d8-6lf6p   1/1     Running   0          3m2s

NAME                                      CONFIG            SHARED   NODES             AGE
instance.akri.sh/akri-debug-echo-b0a64b   akri-debug-echo   false    ["nuc-ubuntu2"]   56s
instance.akri.sh/akri-debug-echo-df7e4c   akri-debug-echo   false    ["nuc-ubuntu2"]   55s

NAME                                    CAPACITY   AGE
configuration.akri.sh/akri-debug-echo   2          3m2s
  1. Now, lets deploy a Pod that will use these Akri resources. First get the Instance names (which are the name of the resources created by Akri): kubectl get akrii | tail -2 | awk '{print $1}'. Substitute them into the following Pod's resourceRequests, prefixed by akri.sh/:
apiVersion: v1
kind: Pod
metadata:
  name: nginx-broker
spec:
  containers:
  - name: nginx-broker
    image: nginx:1.14.2
    ports:
    - containerPort: 80
    resources:
      limits:                        
        akri.sh/akri-debug-echo-b0a64b: "1"
        akri.sh/akri-debug-echo-df7e4c : "1"
      requests:
        akri.sh/akri-debug-echo-b0a64b: "1"
        akri.sh/akri-debug-echo-df7e4c : "1"
  1. Apply the pod: kubectl apply -f sample-broker.yaml
  2. Exec into the broker Pod to examine the environment variables
kubectl exec -it nginx-broker -- /bin/bash
  1. In the broker, print the expected DEBUG_ECHO_DESCRIPTION environment variable that was added by the Agent device plugin and note that only one device name device1 is listed rather than both devices despite both being reserved by the Pod:
root@nginx-broker:/# printenv | grep DEBUG_ECHO_DESCRIPTION
DEBUG_ECHO_DESCRIPTION=device1

Potential solution

Instead of one environment varibale, there should be N environment variables for N devices, each with an expected prefix, so the broker could do the equivalent of printenv | grep ONVIF_DEVICE_SERVICE_URL* to know which devices it can connect to.
This would be a breaking change, requiring sample brokers to be updated to do the new env var querying. It would require changes to the Discovery Handlers, where they would specify a unique label for each device here instead of the same one. I.E. the following:

properties.insert(
    super::DEBUG_ECHO_DESCRIPTION_LABEL.to_string(),
    description.clone(),
);

could change to:

properties.insert(
    format!(super::DEBUG_ECHO_DESCRIPTION_LABEL.to_string(), hash(instance_name)),
    description.clone(),
);
@kate-goldenring kate-goldenring added good first issue Good for newcomers bug Something isn't working labels Jul 7, 2022
@michaelzhang114
Copy link

I'm interested in taking a crack at this!

@kate-goldenring
Copy link
Contributor Author

Awesome! Just assigned you to it. @bfjelds and @romoh what are your thoughts around naming the mounted environment variables as ENV_VAR_GENERIC_NAME_DEVICE_SPECIFIC_HASH?

@bfjelds
Copy link
Collaborator

bfjelds commented Jul 8, 2022

i'm not sure i have strong feelings. if there are no limits on envvar lengths, we might just add a normalized instance_name? might be easier to debug if there are issues (which env var traces to which instance, etc).

@kate-goldenring
Copy link
Contributor Author

@michaelzhang114 any updates or questions on this? Would be great to prioritize this for the next release.

@harrison-tin
Copy link
Collaborator

any opinions on naming the mounted env vars? I know @kate-goldenring suggested ENV_VAR_GENERIC_NAME_DEVICE_SPECIFIC_HASH, but the restriction on that is the discovery handlers don't have any information about the instance and the hash when they populate the properties.

Without changing the flow, I'm thinking if it is easier to just name them as <DESCRIPTION-LABEL>-<ID>. For example:
Debug-echo:
DEBUG_ECHO_DESCRIPTION-device1 = device1
DEBUG_ECHO_DESCRIPTION-device2 = device2

OPCUA:
OPCUA_DISCOVERY_URL-opc.tcp://: = opc.tcp://:

The broker can still do the equivalent of printenv | grep DEBUG_ECHO_DESCRIPTION* or printenv | grep OPCUA_DISCOVERY_URL* to get all the environment variables. I believe device ID is the only information the discovery handlers have about the devices when discovering, unless we want to populate the device properties somewhere else.

@kate-goldenring
Copy link
Contributor Author

One option would be to create some assumption that all device specific environment variables that are prefixed with AKRIshould be later suffixed by the agent during allocate with the instance id here. This seems a bit messy.

I realized that we don't need the suffix to be globally unique (like a hashed instance id); rather it just needs to be unique among the devices mounted into the broker. Because of this, we can have the discovery handlers suffix each of the Device's properties keys (which get converted to env vars) with that device's (unhashed) id. This puts control in the DH but means that collisions could still happen if DH's dont add unique suffixes as expected.

If the Agent does the suffixing, rather than during allocate, it could also be done by the DiscoveryOperator right after discovery. The DO would transform the properties to all contain the Device.id or instance_id (as we originally discussed) as a suffix. The pro of this is guarentees that there are no collisions. Cons are that all properties are suffixed. This is probably fine but should be well documented.

@johnsonshih
Copy link
Contributor

The annotation that Akri Agent set in ContainerAllocateResponse for each virtual device has the same problem. All virtual devices will set to the same annotation field "akri.agent.slot"

@johnsonshih
Copy link
Contributor

Another proposal for device properties is to define some tags for DH to control whether to append a suffix and what the suffix is.
We have 3 possible for suffix:
No suffix or
Suffix with the hash from instance id or
Suffix with the hash from usage slot id

We can append Suffix with the hash from usage slot id if no tag being specified and do differently if tag is specified. For example, device property

DEBUG_ECHO_DESCRIPTION => DEBUG_ECHO_DESCRIPTION_
DEBUG_ECHO_DESCRIPTION{no-suffix} => DEBUG_ECHO_DESCRIPTION
DEBUG_ECHO_DESCRIPTION{instance-suffix} => DEBUG_ECHO_DESCRIPTION_

or

DEBUG_ECHO_DESCRIPTION => DEBUG_ECHO_DESCRIPTION
DEBUG_ECHO_DESCRIPTION{usage-suffix} => DEBUG_ECHO_DESCRIPTION_
DEBUG_ECHO_DESCRIPTION{instance-suffix} => DEBUG_ECHO_DESCRIPTION_

@rpieczon
Copy link

rpieczon commented Apr 4, 2023

Is there any chance to have it as a part of incoming release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
Status: Done
8 participants