A Kubernetes / OpenShift operator for network observability. It deploys a flow monitoring pipeline based on an eBPF agent or IPFIX exports. It provides dashboards, metrics, and keeps flows accessible in a queryable log store: Grafana Loki. When used in OpenShift, new dashboards are available in the Console.
You can install NetObserv Operator using OLM if it is available in your cluster, or directly from its repository.
NetObserv Operator is available in OperatorHub, just follow the guided steps. It is also available in the OperatorHub catalog directly in the OpenShift Console.
After the operator is installed, create a FlowCollector
resource:
Note: if you are not using OVN-Kubernetes CNI, we recommend using
ebpf
as theagent
, rather thanipfix
(unless you know what you do). NetObserv will automatically configure OVN-Kubernetes for IPFIX exports, but cannot do such with other CNIs.
A couple of make
targets are provided in this repository to allow installing without OLM:
git clone https://github.com/netobserv/network-observability-operator.git && cd network-observability-operator
make deploy deploy-loki deploy-grafana
It will deploy the operator in its latest version, with port-forwarded Loki and Grafana.
Note: the
loki-deploy
script is provided as a quick install path, it is not suitable for production. Please refer to the official documentation for a clean install.
To deploy the monitoring pipeline, this make
target installs a FlowCollector
with default values:
make deploy-sample-cr
Alternatively, you can grab and edit this config before installing it.
Note: if you are not using OVN-Kubernetes CNI, we recommend using
ebpf
asspec.agent
, rather thanipfix
(unless you know what you do). NetObserv will automatically configure OVN-Kubernetes for IPFIX exports, but cannot do such with other CNIs.
You can still edit the FlowCollector
after it's installed: the operator will take care about reconciling everything with the updated configuration:
kubectl edit flowcollector cluster
To deploy a specific version of the operator, you need to switch to the related git branch, then add a VERSION
env to the above make command, e.g:
git checkout 0.1.2
VERSION=0.1.2 make deploy deploy-loki deploy-grafana
kubectl apply -f ./config/samples/flows_v1alpha1_flowcollector_versioned.yaml
Beware that the version of the underlying components, such as flowlogs-pipeline, may be tied to the version of the operator (this is why we recommend switching the git branch). Breaking this correlation may result in crashes. The versions of the underlying components are defined in the FlowCollector
resource as image tags.
Pre-requisite: OpenShift 4.10 or above
If the OpenShift Console is detected in the cluster, a console plugin is deployed when a FlowCollector
is installed. It adds new dashboards to the console:
- A flow table, with powerful filtering and display options
- A topology view, with the same filtering options and several levels of aggregations (nodes, namespaces, owner controllers, pods). A side panel provides contextual insight and metrics.
These dashboards are accessible directly from the main menu, and also as contextual tabs for any Pod, Deployment, Service (etc.) in their details page.
Coming soon
Grafana can be used to retrieve and show the collected flows from Loki. If you used the make
commands provided above to install NetObserv from the repository, you should already have Grafana installed and configured.
Else you can find here some help to install Grafana if needed.
Then import this dashboard in Grafana. It includes a table of the flows and some graphs showing the volumetry per source or destination namespaces or workload:
The FlowCollector
resource is used to configure the operator and its managed components. A comprehensive documentation is available here, and a full sample file there.
To edit configuration in cluster, run:
kubectl edit flowcollector cluster
As it operates cluster-wide, only a single FlowCollector
is needed and possible, and it has to be named cluster
.
A couple of settings deserve special attention:
-
Agent (
spec.agent
) can beipfix
orebpf
. As mentioned above, the IPFIX option is fully functional when using OVN-Kubernetes CNI (other CNIs are not supported, but you may still be able to configure them manually if they allow IPFIX exports) whereas eBPF is expected to work regardless the running CNI. -
Sampling (
spec.ipfix.sampling
andspec.ebpf.sampling
): 24/7 unsampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still often necessary to mitigate by setting a sampling ratio. A value of100
means: one flow every 100 is sampled.1
means no sampling. The lower it is, the more accurate are flows and derived metrics. -
Loki (
spec.loki
): configure here how to reach Loki. The default values match the Loki quick install paths mentioned in the Getting Started section, but you may have to configure differently if you used another installation method. -
Kafka (
spec.kafka
): experimental - when enabled, integrate the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Assumes Kafka is already deployed and a topic is created. For convenience, we provide a quick deployment using strimzi: runmake deploy-kafka
from the repository.
TODO
Please refer to this documentation for everything related to building, deploying or bundling from sources.
If you can't find help here, don't hesitate to open an issue or a Q&A. There are several repositories under netobserv github org, but it is fine to centralize these in network-observability-operator.
No! While some features are developed primarily for OpenShift, we want to keep it on track with other / "vanilla" Kubes. For instance, there has been some work to make the console plugin run as a standalone, or the operator to manage upstream (non-OpenShift) ovn-kubernetes.
And if something is not working as hoped with your setup, you are welcome to contribute to the project ;-)
It depends on which agent
you want to use: ebpf
or ipfix
, and whether you want to get the OpenShift Console plugin.
What matters is the version of the Linux kernel. TODO: precisions Other than that, there are no known restrictions (yet?) on the Kubernetes version.
OpenShift 4.10 or above, or upstream OVN-Kubernetes [TODO: upstream version?] are recommended, as the operator will configure OVS for you. Else, you need to configure it manually.
For OpenShift 4.8 or 4.9:
- Configure
spec.flowlogsPipeline.kind
to beDeployment
- Run the following:
FLP_IP=`kubectl get svc flowlogs-pipeline -n network-observability -ojsonpath='{.spec.clusterIP}'` && echo $FLP_IP
kubectl patch networks.operator.openshift.io cluster --type='json' -p "[{'op': 'add', 'path': '/spec', 'value': {'exportNetworkFlows': {'ipfix': { 'collectors': ['$FLP_IP:2055']}}}}]"
OpenShift versions older than 4.8 don't support IPFIX exports.
For other CNIs, you need to find out if they can export IPFIX, and configure them accordingly.
OpenShift 4.10 or above is required.
Make sure all pods are up and running:
# Assuming configured namespace is network-observability (default)
kubectl get pods -n network-observability
Should provide results similar to this:
NAME READY STATUS RESTARTS AGE
flowlogs-pipeline-5rrg2 1/1 Running 0 43m
flowlogs-pipeline-cp2lb 1/1 Running 0 43m
flowlogs-pipeline-hmwxd 1/1 Running 0 43m
flowlogs-pipeline-wmx4z 1/1 Running 0 43m
grafana-6dbddc9869-sxn62 1/1 Running 0 31m
loki 1/1 Running 0 43m
netobserv-controller-manager-7487d87dc-2ltq2 2/2 Running 0 43m
network-observability-plugin-7fb8c5477b-drg2z 1/1 Running 0 43m
Results may slightly differ depending on the installation method and the FlowCollector
configuration. At least you should see flowlogs-pipeline
pods in a Running
state.
If you use the eBPF agent in privileged mode (spec.ebpf.privileged=true
), check also for pods in privileged namespace:
# Assuming configured namespace is network-observability (default)
kubectl get pods -n network-observability-privileged
NAME READY STATUS RESTARTS AGE
netobserv-ebpf-agent-7rwtk 1/1 Running 0 7s
netobserv-ebpf-agent-c7nkv 1/1 Running 0 7s
netobserv-ebpf-agent-hbjz8 1/1 Running 0 7s
netobserv-ebpf-agent-ldj66 1/1 Running 0 7s
Finally, make sure Loki is correctly deployed, and reachable from pods via the URL defined in spec.loki.url
.
Wait 10 minutes and check again. When spec.agent
is ipfix
, there is sometimes a delay, up to 10 minutes, before the flows appear. This is due to the IPFIX protocol requiring exporter and collector to exchange record template definitions as a preliminary step. The eBPF agent doesn't have such a delay.
Make sure there are no errors in flowlogs-pipeline
pods log.
(TODO / TO CONTINUE)
Make sure your cluster version is at least OpenShift 4.10: prior versions have no (or incompatible) console plugin SDK.
Make sure that spec.consolePlugin.register
is set to true
(default).
If not, or if for any reason the registration seems to have failed, you can still do it manually by editing the Console Operator config:
kubectl edit console.operator.openshift.io cluster
If it's not already there, add the plugin reference:
spec:
plugins:
- network-observability-plugin
If the new dashboards still don't show up, try clearing your browser cache and refreshing. Check also the netobserv-console-plugin-...
pod status and logs.
kubectl get pods -n network-observability -l app=network-observability-plugin
kubectl logs -n network-observability -l app=network-observability-plugin
This project is licensed under Apache 2.0 and accepts contributions via GitHub pull requests. Other related netobserv
projects follow the same rules:
External contributions are welcome and can take various forms:
- Providing feedback, by starting discussions or opening issues.
- Code / doc contributions. You will find here some help on how to build, run and test your code changes. Don't hesitate to ask for help.