Enable mounting connectivity information for multiple devices/instances in a Pod #492

kate-goldenring · 2022-07-07T23:07:13Z

Background on how Akri adds connectivity information to brokers

The Akri Agent creates a device plugin for each device it discovers, creating a node level resource for that device. Then, a Pod can request that IoT device just as it would a compute resource (CPU and memory) in the resourceRequests section of the PodSpec. This is explained further in Akri's requesting Akri resources documentation. Akri's Controller does this on your behalf when a brokerSpec is specified in a Configuration. It will deploy that Pod or Job, adding the discovered device as a resourceRequest. Then, before the Pod is run on a node, the kubelet calls Allocate on the device plugin in the Akri Agent to confirm that it can use that resource (device). In it's response, the device plugin can specify any environment variables or mounts that should be added to the container using that device. This is where Akri sets connectivity information (and adds devnode mounts for udev). The environment variables are delineated in Akri's broker documentation.

Problem

The issue is that there is one environment variable name for all of a protocol of devices. This has been fine in Akri's prevailing scenario of one device per broker; however, it causes missing connectivity information for the remaining N-1 devices requested.

For example, for ONVIF, the RTSP url for the camera is set under an environment variable named ONVIF_DEVICE_SERVICE_URL, so say a Pod has the following resourceRequests:

  resources:
    limits:
      akri.sh/onvif-camera-id1: "1"
      akri.sh/onvif-camera-id2: "1"
    requests:
      akri.sh/onvif-camera-id1: "1"
      akri.sh/onvif-camera-id2: "1"

When the broker Pod books up, only one (of the two camera's) RTSP URL is set in the environment variable ONVIF_DEVICE_SERVICE_URL. Instead, a list of urls should be specified or there should be N environment variables for N devices, each with an expected prefix, so the broker could do the equivalent of printenv | grep ONVIF_DEVICE_SERVICE_URL* to know which devices it can connect to.

Demo of the problem

The following demonstrates the problem using Akri's debugEcho Discovery Handler which "discovers" a set of devices declared in an Akri Configuration. The name of the discovered device is then mounted in the brokers. This walk through shows how only one device name is mounted when a Pod requests both discovered debugEcho resources.

Install Akri, telling the debugEcho Discovery Handler to discover two fake devices called device1 and device2. Note that we are not specifying a broker Pod as we will manually deploy a Pod that will used the discovered devices.

helm repo add akri-helm-charts https://project-akri.github.io/akri/
helm install akri akri-helm-charts/akri \
    $AKRI_HELM_CRICTL_CONFIGURATION \
    --set agent.allowDebugEcho=true \
    --set debugEcho.discovery.enabled=true \
    --set debugEcho.configuration.enabled=true \
    --set debugEcho.configuration.shared=false \
    --set debugEcho.configuration.discoveryDetails.descriptions[0]="device1" \
    --set debugEcho.configuration.discoveryDetails.descriptions[1]="device2"

Watch as the two devices are discovered and two Akri instances are created: kubectl get pods,akrii,akric. Output would look similar to the following on a single node cluster:

NAME                                              READY   STATUS    RESTARTS   AGE
pod/akri-debug-echo-discovery-daemonset-9rxsz     1/1     Running   0          3m2s
pod/akri-agent-daemonset-6tt76                    1/1     Running   0          3m2s
pod/akri-controller-deployment-79c857f4d8-6lf6p   1/1     Running   0          3m2s

NAME                                      CONFIG            SHARED   NODES             AGE
instance.akri.sh/akri-debug-echo-b0a64b   akri-debug-echo   false    ["nuc-ubuntu2"]   56s
instance.akri.sh/akri-debug-echo-df7e4c   akri-debug-echo   false    ["nuc-ubuntu2"]   55s

NAME                                    CAPACITY   AGE
configuration.akri.sh/akri-debug-echo   2          3m2s

Now, lets deploy a Pod that will use these Akri resources. First get the Instance names (which are the name of the resources created by Akri): kubectl get akrii | tail -2 | awk '{print $1}'. Substitute them into the following Pod's resourceRequests, prefixed by akri.sh/:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-broker
spec:
  containers:
  - name: nginx-broker
    image: nginx:1.14.2
    ports:
    - containerPort: 80
    resources:
      limits:                        
        akri.sh/akri-debug-echo-b0a64b: "1"
        akri.sh/akri-debug-echo-df7e4c : "1"
      requests:
        akri.sh/akri-debug-echo-b0a64b: "1"
        akri.sh/akri-debug-echo-df7e4c : "1"

Apply the pod: kubectl apply -f sample-broker.yaml
Exec into the broker Pod to examine the environment variables

kubectl exec -it nginx-broker -- /bin/bash

In the broker, print the expected DEBUG_ECHO_DESCRIPTION environment variable that was added by the Agent device plugin and note that only one device name device1 is listed rather than both devices despite both being reserved by the Pod:

root@nginx-broker:/# printenv | grep DEBUG_ECHO_DESCRIPTION
DEBUG_ECHO_DESCRIPTION=device1

Potential solution

Instead of one environment varibale, there should be N environment variables for N devices, each with an expected prefix, so the broker could do the equivalent of printenv | grep ONVIF_DEVICE_SERVICE_URL* to know which devices it can connect to.
This would be a breaking change, requiring sample brokers to be updated to do the new env var querying. It would require changes to the Discovery Handlers, where they would specify a unique label for each device here instead of the same one. I.E. the following:

properties.insert(
    super::DEBUG_ECHO_DESCRIPTION_LABEL.to_string(),
    description.clone(),
);

could change to:

properties.insert(
    format!(super::DEBUG_ECHO_DESCRIPTION_LABEL.to_string(), hash(instance_name)),
    description.clone(),
);

The text was updated successfully, but these errors were encountered:

michaelzhang114 · 2022-07-07T23:12:45Z

I'm interested in taking a crack at this!

kate-goldenring · 2022-07-07T23:14:38Z

Awesome! Just assigned you to it. @bfjelds and @romoh what are your thoughts around naming the mounted environment variables as ENV_VAR_GENERIC_NAME_DEVICE_SPECIFIC_HASH?

bfjelds · 2022-07-08T00:30:30Z

i'm not sure i have strong feelings. if there are no limits on envvar lengths, we might just add a normalized instance_name? might be easier to debug if there are issues (which env var traces to which instance, etc).

kate-goldenring · 2022-09-08T16:52:57Z

@michaelzhang114 any updates or questions on this? Would be great to prioritize this for the next release.

harrison-tin · 2023-01-04T01:36:53Z

any opinions on naming the mounted env vars? I know @kate-goldenring suggested ENV_VAR_GENERIC_NAME_DEVICE_SPECIFIC_HASH, but the restriction on that is the discovery handlers don't have any information about the instance and the hash when they populate the properties.

Without changing the flow, I'm thinking if it is easier to just name them as <DESCRIPTION-LABEL>-<ID>. For example:
Debug-echo:
DEBUG_ECHO_DESCRIPTION-device1 = device1
DEBUG_ECHO_DESCRIPTION-device2 = device2

OPCUA:
OPCUA_DISCOVERY_URL-opc.tcp://: = opc.tcp://:

The broker can still do the equivalent of printenv | grep DEBUG_ECHO_DESCRIPTION* or printenv | grep OPCUA_DISCOVERY_URL* to get all the environment variables. I believe device ID is the only information the discovery handlers have about the devices when discovering, unless we want to populate the device properties somewhere else.

kate-goldenring · 2023-02-07T18:30:03Z

One option would be to create some assumption that all device specific environment variables that are prefixed with AKRIshould be later suffixed by the agent during allocate with the instance id here. This seems a bit messy.

I realized that we don't need the suffix to be globally unique (like a hashed instance id); rather it just needs to be unique among the devices mounted into the broker. Because of this, we can have the discovery handlers suffix each of the Device's properties keys (which get converted to env vars) with that device's (unhashed) id. This puts control in the DH but means that collisions could still happen if DH's dont add unique suffixes as expected.

If the Agent does the suffixing, rather than during allocate, it could also be done by the DiscoveryOperator right after discovery. The DO would transform the properties to all contain the Device.id or instance_id (as we originally discussed) as a suffix. The pro of this is guarentees that there are no collisions. Cons are that all properties are suffixed. This is probably fine but should be well documented.

johnsonshih · 2023-02-16T08:16:38Z

The annotation that Akri Agent set in ContainerAllocateResponse for each virtual device has the same problem. All virtual devices will set to the same annotation field "akri.agent.slot"

johnsonshih · 2023-02-28T08:11:43Z

Another proposal for device properties is to define some tags for DH to control whether to append a suffix and what the suffix is.
We have 3 possible for suffix:
No suffix or
Suffix with the hash from instance id or
Suffix with the hash from usage slot id

We can append Suffix with the hash from usage slot id if no tag being specified and do differently if tag is specified. For example, device property

DEBUG_ECHO_DESCRIPTION => DEBUG_ECHO_DESCRIPTION_
DEBUG_ECHO_DESCRIPTION{no-suffix} => DEBUG_ECHO_DESCRIPTION
DEBUG_ECHO_DESCRIPTION{instance-suffix} => DEBUG_ECHO_DESCRIPTION_

or

DEBUG_ECHO_DESCRIPTION => DEBUG_ECHO_DESCRIPTION
DEBUG_ECHO_DESCRIPTION{usage-suffix} => DEBUG_ECHO_DESCRIPTION_
DEBUG_ECHO_DESCRIPTION{instance-suffix} => DEBUG_ECHO_DESCRIPTION_

rpieczon · 2023-04-04T10:03:21Z

Is there any chance to have it as a part of incoming release?

kate-goldenring added good first issue Good for newcomers bug Something isn't working labels Jul 7, 2022

kate-goldenring assigned michaelzhang114 Jul 7, 2022

josephknierman assigned josephknierman and unassigned michaelzhang114 Sep 14, 2022

harrison-tin assigned harrison-tin and unassigned josephknierman Nov 28, 2022

harrison-tin assigned johnsonshih and unassigned harrison-tin Feb 17, 2023

This was referenced Feb 28, 2023

Suffix usage slot to annotation key name #560

Merged

Append hash of device usage id to device property key name #561

Merged

This was linked to pull requests Apr 3, 2023

Suffix usage slot to annotation key name #560

Merged

Append hash of device usage id to device property key name #561

Merged

yujinkim-msft closed this as completed in #560 Apr 5, 2023

yujinkim-msft reopened this Apr 5, 2023

yujinkim-msft closed this as completed in #561 Apr 7, 2023

kate-goldenring added this to Akri Roadmap Oct 3, 2023

kate-goldenring moved this to Done in Akri Roadmap Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable mounting connectivity information for multiple devices/instances in a Pod #492

Enable mounting connectivity information for multiple devices/instances in a Pod #492

kate-goldenring commented Jul 7, 2022

michaelzhang114 commented Jul 7, 2022

kate-goldenring commented Jul 7, 2022

bfjelds commented Jul 8, 2022

kate-goldenring commented Sep 8, 2022

harrison-tin commented Jan 4, 2023

kate-goldenring commented Feb 7, 2023

johnsonshih commented Feb 16, 2023

johnsonshih commented Feb 28, 2023

rpieczon commented Apr 4, 2023

Enable mounting connectivity information for multiple devices/instances in a Pod #492

Enable mounting connectivity information for multiple devices/instances in a Pod #492

Comments

kate-goldenring commented Jul 7, 2022

Background on how Akri adds connectivity information to brokers

Problem

Demo of the problem

Potential solution

michaelzhang114 commented Jul 7, 2022

kate-goldenring commented Jul 7, 2022

bfjelds commented Jul 8, 2022

kate-goldenring commented Sep 8, 2022

harrison-tin commented Jan 4, 2023

kate-goldenring commented Feb 7, 2023

johnsonshih commented Feb 16, 2023

johnsonshih commented Feb 28, 2023

rpieczon commented Apr 4, 2023