fix(operator): Use safe bearer token authentication to scrape operator metrics #12164

periklis · 2024-03-08T10:11:59Z

What this PR does / why we need it:
In OpenShift clusters we have the option to scrape operator metrics either via cluster-monitoring (default case) or user-workload-monitoring (managed clusters, where users track operator metrics themselves). Until now the service monitor for scraping operator metrics was only compatible with cluster-monitoring that allows using bearerTokenFile and tlsConfig.caFile. Both are not allowed when scraping with user-workload-monitoring. The Prometheus Operator in user-workload-monitoring is configured with ArbitraryFSAccessThroughSMsConfig.Deny: true which in turn disallows the prometheus binary to access it's own serviceaccount token to scrape metrics.

Which issue(s) this PR fixes:
Fixes LOG-5165, Replaces #11680

Special notes for your reviewer:
The changeset below introduces a set of new manifests to make an explicit distinction which serviceaccount is used by the Loki Operator itself as well as which is used by prometheus to access metrics only, i.e.

The serviceaccount loki-operator-controller-manager is introduced to be used only by the Loki Operator manager container. This account is bound to RBAC listed in each supported bundle ClusterServiceVersion.
The serviceaccount loki-operator-controller-manager-metrics-reader is introduced along with a secret that holds a long-lived API token and the service CA certificate. The token is referenced in the ServiceMonitor in authorization.credentials replacing bearerTokenFile. The certificate is referenced in the ServiceMonitor in tlsConfig.ca replacing tlsConfig.caFile. Also it is used by Prometheus to scrape metrics from the Loki Operator manager container only through the kube-rbac-proxy sidecar. This serviceaccount is assigned in a ClusterRoleBinding namely loki-operator-controller-manager-read-metrics to get access to the Non-Resoure-URL get/metrics.

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

JoaoBraveCoding

From a code POV lgtm (didn't get to test it on a cluster). This is mainly for use cases where we deploy the LokiOperator in non openshift- namespaces correct?

periklis · 2024-03-11T13:27:54Z

From a code POV lgtm (didn't get to test it on a cluster). This is mainly for use cases where we deploy the LokiOperator in non openshift- namespaces correct?

No this when we install on openshift-operators-redhat but the cluster admin is monitoring this namespace with user-workload-monitoring instead of cluster-monitoring. This happens on managed clusters where OLM operators are considered as user-workloads.

…pport

xperimental

For me this needed a change to work (see comment).

operator/config/overlays/openshift/prometheus_service_monitor_patch.yaml

operator/config/overlays/community-openshift/prometheus_service_monitor_patch.yaml

operator/config/overlays/openshift/manager_related_image_patch.yaml

…r metrics (grafana#12164)

[release-5.6] Backport PR grafana#12164 and grafana#12216

[release-5.8] Backport PR grafana#12164 and grafana#12216

[release-5.7] Backport PR grafana#12164 and grafana#12216

…r metrics (grafana#12164)

fix(operator): Use service-ca provided cert/key/ca for operator sm

0b0144e

periklis self-assigned this Mar 8, 2024

periklis requested review from xperimental and a team as code owners March 8, 2024 10:12

pull-request-size bot added the size/M label Mar 8, 2024

github-actions bot added the sig/operator label Mar 8, 2024

Fix servicemonitor secret references

5f02821

pull-request-size bot added size/L and removed size/M labels Mar 11, 2024

JoaoBraveCoding reviewed Mar 11, 2024

View reviewed changes

Switch from mTLS to bearer token authentication

4a065a4

periklis force-pushed the operator-ocp-uwm-support branch from cb5ae36 to 4a065a4 Compare March 12, 2024 08:53

periklis added 4 commits March 12, 2024 10:16

Add metrics reader sa and rbac

e9c9c3f

Elevate auth proxy client rolebinding to clusterrolebinding

b38ec70

Cleanup openshift bundle

af45394

Cleanup bundles

c5664dd

periklis changed the title ~~fix(operator): Use service-ca provided cert/key/ca for operator sm~~ fix(operator): Use safe bearer token authentication to scrape operator metrics Mar 12, 2024

periklis added 2 commits March 12, 2024 11:32

Merge remote-tracking branch 'upstream/main' into operator-ocp-uwm-su…

e69f0d5

…pport

Add changelog entry

fe054c5

xperimental reviewed Mar 12, 2024

View reviewed changes

operator/config/overlays/openshift/prometheus_service_monitor_patch.yaml Show resolved Hide resolved

operator/config/overlays/openshift/prometheus_service_monitor_patch.yaml Outdated Show resolved Hide resolved

periklis added 4 commits March 12, 2024 20:35

Apply suggestions from code review

3998ef1

Merge branch 'main' into operator-ocp-uwm-support

9b6bda0

Move back to targetPort

8d40f29

Remove imagePullPolicy Always

fd62efa

xperimental reviewed Mar 13, 2024

View reviewed changes

operator/config/overlays/community-openshift/prometheus_service_monitor_patch.yaml Outdated Show resolved Hide resolved

operator/config/overlays/openshift/manager_related_image_patch.yaml Outdated Show resolved Hide resolved

Fix monitor server name for community-openshift bundle

366f2e7

xperimental approved these changes Mar 14, 2024

View reviewed changes

Merge branch 'main' into operator-ocp-uwm-support

d3ba8f8

periklis enabled auto-merge (squash) March 14, 2024 18:26

periklis merged commit 862d0fb into grafana:main Mar 14, 2024
18 checks passed

periklis added a commit to periklis/loki that referenced this pull request Mar 14, 2024

fix(operator): Use safe bearer token authentication to scrape operato…

ab29151

…r metrics (grafana#12164)

periklis added a commit to periklis/loki that referenced this pull request Mar 14, 2024

fix(operator): Use safe bearer token authentication to scrape operato…

3539b92

…r metrics (grafana#12164)

periklis added a commit to periklis/loki that referenced this pull request Mar 14, 2024

fix(operator): Use safe bearer token authentication to scrape operato…

027e54e

…r metrics (grafana#12164)

openshift-merge-bot bot added a commit to openshift/loki that referenced this pull request Mar 15, 2024

Merge pull request #275 from periklis/backport-operator-smon-prs-5.6

56ceb14

[release-5.6] Backport PR grafana#12164 and grafana#12216

openshift-merge-bot bot added a commit to openshift/loki that referenced this pull request Mar 15, 2024

Merge pull request #273 from periklis/backport-operator-smon-prs-5.8

481947f

[release-5.8] Backport PR grafana#12164 and grafana#12216

openshift-merge-bot bot added a commit to openshift/loki that referenced this pull request Mar 15, 2024

Merge pull request #274 from periklis/backport-operator-smon-prs-5.7

0b3791b

[release-5.7] Backport PR grafana#12164 and grafana#12216

loki-gh-app bot mentioned this pull request Mar 27, 2024

chore(add-major-release-workflow): release 3.0.0-rc.1 #12380

Closed

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024

fix(operator): Use safe bearer token authentication to scrape operato…

8da4d2f

…r metrics (grafana#12164)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(operator): Use safe bearer token authentication to scrape operator metrics #12164

fix(operator): Use safe bearer token authentication to scrape operator metrics #12164

periklis commented Mar 8, 2024 •

edited

Loading

JoaoBraveCoding left a comment

periklis commented Mar 11, 2024

xperimental left a comment

fix(operator): Use safe bearer token authentication to scrape operator metrics #12164

fix(operator): Use safe bearer token authentication to scrape operator metrics #12164

Conversation

periklis commented Mar 8, 2024 • edited Loading

JoaoBraveCoding left a comment

Choose a reason for hiding this comment

periklis commented Mar 11, 2024

xperimental left a comment

Choose a reason for hiding this comment

periklis commented Mar 8, 2024 •

edited

Loading