feat(operator): Restructure LokiStack metrics #12228

xperimental · 2024-03-15T16:52:46Z

What this PR does / why we need it:

The current implementation of the LokiStack metrics emitted by Loki Operator sometimes keeps old versions of the metrics. This leads to the metrics not properly reflecting the current state of the deployed LokiStack resources.

This PR replaces the current implementation which relied on being called by the reconciliation loop by an implementation that fetches the current information of the LokiStack resources by itself and generates metrics from that data. As the list of LokiStack resources should already be cached in the informer this should not put any additional stress on the Kubernetes API server.

Which issue(s) this PR fixes:

LOG-5253

Special notes for your reviewer:

This completely replaces the existing metrics. Only two metrics remain:
- lokistack_info containing information about the deployed LokiStack resources as labels (currently limited to the "size")
- lokistack_status_condition showing the current state of the Conditions available in the LokiStack status. This setup mirrors the metrics provided by kube_state_metrics
The alert for "needs storage schema update" has been updated to reflect the new metrics setup. I'm not aware of any other usage of these metrics so far.
Calling List on each metrics scrape seems wasteful, but my theory is that this data should be available in the informer cache of the operator. I had a look at the apiserver metrics while running this operator version compared to a version without this change and it does look like this is true and the change does not have any effect on the apiserver.

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Tests updated
CHANGELOG.md updated

periklis

LGTM

xperimental · 2024-03-17T12:12:39Z

I wonder, if it would be a good idea to have a metric exposing the state of all conditions on the LokiStack (similar to what kube_state_metrics does for the standard types) instead of just having a metric for the warnings.

That way, we (or customers) could also produce alerts on the other conditions and, for example, automatically warn when a LokiStack fails to become ready for a while.

feat(operator): Restructure LokiStack metrics

40c4f08

xperimental self-assigned this Mar 15, 2024

xperimental requested review from periklis and a team as code owners March 15, 2024 16:52

pull-request-size bot added the size/L label Mar 15, 2024

github-actions bot added the sig/operator label Mar 15, 2024

Fix condition status check

5f2c8fb

periklis reviewed Mar 16, 2024

View reviewed changes

xperimental added 5 commits March 18, 2024 13:52

Fix receiver of collector type

166ed97

Introduce metric for all conditions

2722150

Remove extra warnings metric

b934ee3

Fix lint issue

08d3adc

Update changelog

7fd532c

periklis approved these changes Mar 19, 2024

View reviewed changes

xperimental merged commit ebdf8fe into grafana:main Mar 19, 2024
18 checks passed

xperimental deleted the fix-telemetry branch March 19, 2024 11:42

edsoncelio pushed a commit to edsoncelio/loki that referenced this pull request Mar 22, 2024

feat(operator): Restructure LokiStack metrics (grafana#12228)

4734e8f

loki-gh-app bot mentioned this pull request Mar 27, 2024

chore(add-major-release-workflow): release 3.0.0-rc.1 #12380

Closed

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024

feat(operator): Restructure LokiStack metrics (grafana#12228)

08e5b27

xperimental mentioned this pull request Jul 29, 2024

[release-5.8] Backport metrics and status handling from 5.9 openshift/loki#337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(operator): Restructure LokiStack metrics #12228

feat(operator): Restructure LokiStack metrics #12228

xperimental commented Mar 15, 2024 •

edited

Loading

periklis left a comment

xperimental commented Mar 17, 2024

feat(operator): Restructure LokiStack metrics #12228

feat(operator): Restructure LokiStack metrics #12228

Conversation

xperimental commented Mar 15, 2024 • edited Loading

periklis left a comment

Choose a reason for hiding this comment

xperimental commented Mar 17, 2024

xperimental commented Mar 15, 2024 •

edited

Loading