Skip to content

Latest commit

 

History

History
257 lines (155 loc) · 13.1 KB

README.md

File metadata and controls

257 lines (155 loc) · 13.1 KB

NetObserv Operator

A Kubernetes / OpenShift operator for network observability. It deploys a flow monitoring pipeline based on an eBPF agent or IPFIX exports. It provides dashboards, metrics, and keeps flows accessible in a queryable log store: Grafana Loki. When used in OpenShift, new dashboards are available in the Console.

Getting Started

You can install NetObserv Operator using OLM if it is available in your cluster, or directly from its repository.

Install with OLM

NetObserv Operator is available in OperatorHub, just follow the guided steps. It is also available in the OperatorHub catalog directly in the OpenShift Console.

OpenShift OperatorHub search

After the operator is installed, create a FlowCollector resource:

OpenShift OperatorHub FlowCollector

Note: if you are not using OVN-Kubernetes CNI, we recommend using ebpf as the agent, rather than ipfix (unless you know what you do). NetObserv will automatically configure OVN-Kubernetes for IPFIX exports, but cannot do such with other CNIs.

Install from repository

A couple of make targets are provided in this repository to allow installing without OLM:

git clone https://github.com/netobserv/network-observability-operator.git && cd network-observability-operator
make deploy deploy-loki deploy-grafana

It will deploy the operator in its latest version, with port-forwarded Loki and Grafana.

Note: the loki-deploy script is provided as a quick install path, it is not suitable for production. Please refer to the official documentation for a clean install.

To deploy the monitoring pipeline, this make target installs a FlowCollector with default values:

make deploy-sample-cr

Alternatively, you can grab and edit this config before installing it.

Note: if you are not using OVN-Kubernetes CNI, we recommend using ebpf as spec.agent, rather than ipfix (unless you know what you do). NetObserv will automatically configure OVN-Kubernetes for IPFIX exports, but cannot do such with other CNIs.

You can still edit the FlowCollector after it's installed: the operator will take care about reconciling everything with the updated configuration:

kubectl edit flowcollector cluster

Install older versions

To deploy a specific version of the operator, you need to switch to the related git branch, then add a VERSION env to the above make command, e.g:

git checkout 0.1.2
VERSION=0.1.2 make deploy deploy-loki deploy-grafana
kubectl apply -f ./config/samples/flows_v1alpha1_flowcollector_versioned.yaml

Beware that the version of the underlying components, such as flowlogs-pipeline, may be tied to the version of the operator (this is why we recommend switching the git branch). Breaking this correlation may result in crashes. The versions of the underlying components are defined in the FlowCollector resource as image tags.

OpenShift Console

Pre-requisite: OpenShift 4.10 or above

If the OpenShift Console is detected in the cluster, a console plugin is deployed when a FlowCollector is installed. It adds new dashboards to the console:

  • A flow table, with powerful filtering and display options

Flow table

  • A topology view, with the same filtering options and several levels of aggregations (nodes, namespaces, owner controllers, pods). A side panel provides contextual insight and metrics.

Topology

These dashboards are accessible directly from the main menu, and also as contextual tabs for any Pod, Deployment, Service (etc.) in their details page.

Contextual topology

Standalone console

Coming soon

Grafana

Grafana can be used to retrieve and show the collected flows from Loki. If you used the make commands provided above to install NetObserv from the repository, you should already have Grafana installed and configured.

Else you can find here some help to install Grafana if needed.

Then import this dashboard in Grafana. It includes a table of the flows and some graphs showing the volumetry per source or destination namespaces or workload:

Grafana dashboard

Configuration

The FlowCollector resource is used to configure the operator and its managed components. A comprehensive documentation is available here, and a full sample file there.

To edit configuration in cluster, run:

kubectl edit flowcollector cluster

As it operates cluster-wide, only a single FlowCollector is needed and possible, and it has to be named cluster.

A couple of settings deserve special attention:

  • Agent (spec.agent) can be ipfix or ebpf. As mentioned above, the IPFIX option is fully functional when using OVN-Kubernetes CNI (other CNIs are not supported, but you may still be able to configure them manually if they allow IPFIX exports) whereas eBPF is expected to work regardless the running CNI.

  • Sampling (spec.ipfix.sampling and spec.ebpf.sampling): 24/7 unsampled flow collection may consume a non-negligible amount of resources. While we are doing our best to make it a viable option in production, it is still often necessary to mitigate by setting a sampling ratio. A value of 100 means: one flow every 100 is sampled. 1 means no sampling. The lower it is, the more accurate are flows and derived metrics.

  • Loki (spec.loki): configure here how to reach Loki. The default values match the Loki quick install paths mentioned in the Getting Started section, but you may have to configure differently if you used another installation method.

  • Kafka (spec.kafka): experimental - when enabled, integrate the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Assumes Kafka is already deployed and a topic is created. For convenience, we provide a quick deployment using strimzi: run make deploy-kafka from the repository.

Understanding the deployed components

TODO

Development / contributing

Please refer to this documentation for everything related to building, deploying or bundling from sources.

F.A.Q / Troubleshooting

If you can't find help here, don't hesitate to open an issue or a Q&A. There are several repositories under netobserv github org, but it is fine to centralize these in network-observability-operator.

Is it for OpenShift only?

No! While some features are developed primarily for OpenShift, we want to keep it on track with other / "vanilla" Kubes. For instance, there has been some work to make the console plugin run as a standalone, or the operator to manage upstream (non-OpenShift) ovn-kubernetes.

And if something is not working as hoped with your setup, you are welcome to contribute to the project ;-)

Which version of Kubernetes / OpenShift is supported?

It depends on which agent you want to use: ebpf or ipfix, and whether you want to get the OpenShift Console plugin.

To run the eBPF agent

What matters is the version of the Linux kernel. TODO: precisions Other than that, there are no known restrictions (yet?) on the Kubernetes version.

To use IPFIX exports

OpenShift 4.10 or above, or upstream OVN-Kubernetes [TODO: upstream version?] are recommended, as the operator will configure OVS for you. Else, you need to configure it manually.

For OpenShift 4.8 or 4.9:

  • Configure spec.flowlogsPipeline.kind to be Deployment
  • Run the following:
FLP_IP=`kubectl get svc flowlogs-pipeline -n network-observability -ojsonpath='{.spec.clusterIP}'` && echo $FLP_IP
kubectl patch networks.operator.openshift.io cluster --type='json' -p "[{'op': 'add', 'path': '/spec', 'value': {'exportNetworkFlows': {'ipfix': { 'collectors': ['$FLP_IP:2055']}}}}]"

OpenShift versions older than 4.8 don't support IPFIX exports.

For other CNIs, you need to find out if they can export IPFIX, and configure them accordingly.

To get the OpenShift Console plugin

OpenShift 4.10 or above is required.

How can I make sure everything is correctly deployed?

Make sure all pods are up and running:

# Assuming configured namespace is network-observability (default)
kubectl get pods -n network-observability

Should provide results similar to this:

NAME                                            READY   STATUS    RESTARTS   AGE
flowlogs-pipeline-5rrg2                         1/1     Running   0          43m
flowlogs-pipeline-cp2lb                         1/1     Running   0          43m
flowlogs-pipeline-hmwxd                         1/1     Running   0          43m
flowlogs-pipeline-wmx4z                         1/1     Running   0          43m
grafana-6dbddc9869-sxn62                        1/1     Running   0          31m
loki                                            1/1     Running   0          43m
netobserv-controller-manager-7487d87dc-2ltq2    2/2     Running   0          43m
network-observability-plugin-7fb8c5477b-drg2z   1/1     Running   0          43m

Results may slightly differ depending on the installation method and the FlowCollector configuration. At least you should see flowlogs-pipeline pods in a Running state.

If you use the eBPF agent in privileged mode (spec.ebpf.privileged=true), check also for pods in privileged namespace:

# Assuming configured namespace is network-observability (default)
kubectl get pods -n network-observability-privileged
NAME                         READY   STATUS    RESTARTS   AGE
netobserv-ebpf-agent-7rwtk   1/1     Running   0          7s
netobserv-ebpf-agent-c7nkv   1/1     Running   0          7s
netobserv-ebpf-agent-hbjz8   1/1     Running   0          7s
netobserv-ebpf-agent-ldj66   1/1     Running   0          7s

Finally, make sure Loki is correctly deployed, and reachable from pods via the URL defined in spec.loki.url.

Everything seems correctly deployed but there isn't any flow showing up

Wait 10 minutes and check again. When spec.agent is ipfix, there is sometimes a delay, up to 10 minutes, before the flows appear. This is due to the IPFIX protocol requiring exporter and collector to exchange record template definitions as a preliminary step. The eBPF agent doesn't have such a delay.

I've waited 10 minutes: still nothing

Make sure there are no errors in flowlogs-pipeline pods log. (TODO / TO CONTINUE)

There are no new dashboards in the OpenShift Console

Make sure your cluster version is at least OpenShift 4.10: prior versions have no (or incompatible) console plugin SDK.

Make sure that spec.consolePlugin.register is set to true (default).

If not, or if for any reason the registration seems to have failed, you can still do it manually by editing the Console Operator config:

kubectl edit console.operator.openshift.io cluster

If it's not already there, add the plugin reference:

spec:
  plugins:
  - network-observability-plugin

If the new dashboards still don't show up, try clearing your browser cache and refreshing. Check also the netobserv-console-plugin-... pod status and logs.

kubectl get pods -n network-observability -l app=network-observability-plugin
kubectl logs -n network-observability -l app=network-observability-plugin

Contributions

This project is licensed under Apache 2.0 and accepts contributions via GitHub pull requests. Other related netobserv projects follow the same rules:

External contributions are welcome and can take various forms:

  • Providing feedback, by starting discussions or opening issues.
  • Code / doc contributions. You will find here some help on how to build, run and test your code changes. Don't hesitate to ask for help.