Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infiniband] Fix unbinding of physical functions when configuring Infiniband virtual functions #1

Draft
wants to merge 58 commits into
base: master
Choose a base branch
from

Conversation

clarkzinzow
Copy link
Collaborator

This PR fixes a bug in the config daemon that unbinds physical functions instead of virtual functions when configuring Infiniband virtual functions.

evgenLevin and others added 30 commits September 4, 2024 08:42
PrometheusRules allow recording pre-defined queries. Use
`sriov_kubepoddevice` metric to add `pod|namespace` pair
to the sriov metrics.

Feature is enabled via the `METRICS_EXPORTER_PROMETHEUS_DEPLOY_RULE`
environment variable.

Signed-off-by: Andrea Panattoni <[email protected]>
When the `metricsExporter` feature is turned off, deployed resources
should be removed. These changes fix the error:

```
│ 2024-08-28T14:07:57.699760017Z    ERROR    controller/controller.go:266    Reconciler error    {"controller": "sriovoperatorconfig", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovOperatorConfig", "SriovOperatorConfig": {"name":"default","namespace":"openshift-sriov-network-operator"},  │
│ "namespace": "openshift-sriov-network-operator", "name": "default", "reconcileID": "fa841c50-dbb8-4c4c-9ddd-b98624fd2a24", "error": "failed to delete object &{map[apiVersion:monitoring.coreos.com/v1 kind:ServiceMonitor metadata:map[name:sriov-network-metrics-exporter namespace:openshift-sriov-network-operator]  │
│ spec:map[endpoints:[map[bearerTokenFile:/var/run/secrets/kubernetes.io/serviceaccount/token honorLabels:true interval:30s port:sriov-network-metrics scheme:https tlsConfig:map[caFile:/etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt insecureSkipVerify:false serverName:sriov-network-metrics-expor │
│ ter-service.openshift-sriov-network-operator.svc]]] namespaceSelector:map[matchNames:[openshift-sriov-network-operator]] selector:map[matchLabels:map[name:sriov-network-metrics-exporter-service]]]]} with err: could not delete object (monitoring.coreos.com/v1, Kind=ServiceMonitor) openshift-sriov-network-operato │
│ r/sriov-network-metrics-exporter: servicemonitors.monitoring.coreos.com \"sriov-network-metrics-exporter\" is forbidden: User \"system:serviceaccount:openshift-sriov-network-operator:sriov-network-operator\" cannot delete resource \"servicemonitors\" in API group \"monitoring.coreos.com\" in the namespace \"ope │
│ nshift-sriov-network-operator\""}
```

Signed-off-by: Andrea Panattoni <[email protected]>
…er-rules

[metrics 4/x] Metrics exporter rules
…devices

Refactor some conformance tests to use `SRIOV_NODE_AND_DEVICE_NAME_FILTER`
if the current obj as annotation and the updated doesn't we still want
to add the ones from the current object

Signed-off-by: Sebastian Sch <[email protected]>
When a user deletes the default SriovOperatorConfig resource and
tries to recreate it afterwards, the operator webhooks returns the error:
```
Error from server (InternalError): error when creating "/tmp/opconfig.yml": Internal error occurred: failed calling webhook "operator-webhook.sriovnetwork.openshift.io": failed to call webhook: Post "https://operator-webhook-service.openshift-sriov-network-operator.svc:443/validating-custom-resource?timeout=10s": service "operator-webhook-service" not found
```

as the webhook configuration is still present, while the Service and the DaemonSet has been deleted.

Delete all the webhook configurations when the user deletes
the default SriovOperatorConfig

Signed-off-by: Andrea Panattoni <[email protected]>
The bash syntax was incorrect and yielded:

    hack/env.sh: line 35: ${$RDMA_CNI_IMAGE:-}: bad substitution
…IMAGE

Fix syntax for RDMA_CNI_IMAGE var substitution
It might happen that two SR-IOV pods, deployed on different node, are using devices
with the same PCI address. In such cases, the query suggested [1] by the sriov-network-metrics-exporter produces the error:

```

Error loading values found duplicate series for the match group {pciAddr="0000:3b:02.4"} on the right hand-side of the operation:
    [
        {
            __name__="sriov_kubepoddevice",
            container="test",
            dev_type="openshift.io/intelnetdevice",
            endpoint="sriov-network-metrics",
            instance="10.1.98.60:9110",
            job="sriov-network-metrics-exporter-service",
            namespace="cnf-4916",
            pciAddr="0000:3b:02.4",
            pod="pod-cnfdr22.telco5g.eng.rdu2.redhat.com",
            prometheus="openshift-monitoring/k8s",
            service="sriov-network-metrics-exporter-service"
        }, {
            __name__="sriov_kubepoddevice",
            container="test",
            dev_type="openshift.io/intelnetdevice",
            endpoint="sriov-network-metrics",
            instance="10.1.98.230:9110",
            job="sriov-network-metrics-exporter-service",
            namespace="cnf-4916",
            pciAddr="0000:3b:02.4",
            pod="pod-dhcp-98-230.telco5g.eng.rdu2.redhat.com",
            prometheus="openshift-monitoring/k8s",
            service="sriov-network-metrics-exporter-service"
        }
    ];many-to-many matching not allowed: matching labels must be unique on one side
```

Configure the ServiceMonitor resource to add a `node` label to all metrics.
The right query to get metrics, as updated in the PrometheusRule, will be:

```
sriov_vf_tx_packets * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice
```

Also remove `pod`,  `namespace` and `container` label from the `sriov_vf_*` metrics, as they were
wrongly set to `sriov-network-metrics-exporter-zj2n9`, `openshift-sriov-network-operator`, `kube-rbac-proxy`

[1] https://github.com/k8snetworkplumbingwg/sriov-network-metrics-exporter/blob/0f6a784f377ede87b95f31e569116ceb9775b5b9/README.md?plain=1#L38

Signed-off-by: Andrea Panattoni <[email protected]>
When we want to use config-drive in immutable systems, very often the
config-drive is only used at boot and then umounted (e.g. ignition does
this).

Later when we want to fetch Metadata from the config drive, we actually
have to mount it.

In this PR, I'm adding similar code than coreos/ignition where we
dynamically mount the config-drive is the device was found with the
right label (config-2 or CONFIG-2 as documented in OpenStack). If the
device is found, we mount it, fetch the data and umount it.
Fixes the following shellcheck error:

    SC2068 (error): Double quote array expansions to avoid re-splitting elements.

https://www.shellcheck.net/wiki/SC2068
Fixes the following shellcheck error:

    SC2148 (error): Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

https://www.shellcheck.net/wiki/SC2148
Fixes the following shellcheck errors:

    SC2145 (error): Argument mixes string and array. Use * or separate argument.
    SC2199 (error): Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).

https://www.shellcheck.net/wiki/SC2145
https://www.shellcheck.net/wiki/SC2199

Also fixes a typo in SUPPORTED_INTERFACE_SWITCHER_MODES.
Fixes the following shellcheck error:

    SC2045 (error): Iterating over ls output is fragile. Use globs.

https://www.shellcheck.net/wiki/SC2045
On some kernels GetDevlinkDeviceParam may return empty values
for some kernel parameters. The netlink library is able to handle this, but
the code in GetDevlinkDeviceParam function may panic if unexpected value
received. Add extra checks to avoid panics
Delete webhooks when SriovOperatorConfig is deleted
…getdevlinkdeviceparam

Fix: GetDevlinkDeviceParam to handle edge-cases correctly
…er-drop-labels

[metrics 5/x] Add node label to sriov_* metrics
`sriov_kubepoddevice` metric might end up in the Prometheus
database after a while, as the default scrape interval is 30s. This leads
to failures in the end-to-end lane like:

```
[sriov] Metrics Exporter When Prometheus operator is available [It] Metrics should have the correct labels
/root/opr-ocp2-1/data/sriov-network-operator/sriov-network-operator/test/conformance/tests/test_exporter_metrics.go:132

  [FAILED] no value for metric sriov_kubepoddevice
```

Put the metric assertion in an `Eventually` statement

Signed-off-by: Andrea Panattoni <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
Fixes the following shellcheck error:

    SC2081 (error): [ .. ] can't match globs. Use a case statement.

https://www.shellcheck.net/wiki/SC2081
Warns about shellcheck issues with severity `error`.
…-fix

metrics: Fix `Metrics should have the correct labels` test
openstack: dynamically mount the config-drive
When the operator changes the device-plugin Spec (e.g. .Spec.NodeSelector), it may happen
that there are two device plugin pods for a given node, one that is terminating, the other that is
initializing.
If the config-daemon executes `restartDevicePluginPod()` at the same time, it may kill the terminating
pod, while the initializing one will run with the old dp configuration. This may cause one or more resources
to not being advertised, until a manual device plugin restart occurs.

Make the config-daemon restart all the device-plugin instances it founds for its own node.

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke and others added 28 commits October 9, 2024 17:20
…vice-plugins

config-daemon: Restart all instances of device-plugin
…3-L for SFP, E823-L for backplane for NetSec Accelerator Cards

Fixes Issue k8snetworkplumbingwg#789

Signed-off-by: William Zhao <[email protected]>
…sec_ethernet_controllers

Add Intel Corporation Ethernet Controller E810-XXV for backplane, E823-L for SFP, E823-L for backplane for NetSec Accelerator Cards
The `DiscoverSriovDevices` routine produces a huge amount of log entries,
making debugging problems hard.

Remove log entries that can produce a log line for each configured VF and which
does not produce any change in the environment.

Signed-off-by: Andrea Panattoni <[email protected]>
…on-reduce-logging

logging: Reduce device discovering verbosity
…ad of VF address, in case of using IB link type

Signed-off-by: Ido Heyvi <[email protected]>
have a service that will load the br_netfilter driver after reboot

Signed-off-by: Sebastian Sch <[email protected]>
…ystemd

Add a note in documentation regarding systemd mode
…n-unbind-pf

SRIOV PF got unbind instead of VF in case of IB link type
…k8s-objects-deletion

Adding webhook 'delete' command to remove non-namespaced objects
Kernel arguments like `intel_iommu=on` does not have sense
on AMD or ARM systems and some user might complain about
their presence, though they are likely to be harmless.

Also, on ARM systems the `iommu.passthrough` parameter is the
one to use [1].

Improve `GHWLib` to bridge CPU information from the library.
Add `CpuInfoProviderInterface` and inject it into the GenericPlugin
to implement the per CPU vendor logic.

[1] https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt#L2343

Signed-off-by: Andrea Panattoni <[email protected]>
To include
- jaypipes/ghw#387

Signed-off-by: Andrea Panattoni <[email protected]>
kernel: Set arguments based on CPU architecture
always deploy sriov network device plugin and use a label to enable or
disable it on the nodes

Signed-off-by: Sebastian Sch <[email protected]>
In the context of Hypershift (Hosted Clusters with OpenShift), where a Nodepool
(terminology for a worker Node in HCP) is not a control-plane or
a master Node but a worker, we can't force the Operator to be deployed
on a master node that doesn't exist. Instead, we want to deploy it on a
worker.

The proposal here is to relax the rule and use
`preferredDuringSchedulingIgnoredDuringExecution` instead so the scheduler
will try to find a master node or fallback on other nodes if not found.
@clarkzinzow clarkzinzow force-pushed the clark/config-daemon-ib-unbind-fix branch from 74dc52e to 1d92064 Compare November 24, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.