Draft: Unified Alerting support proposal #1144

siegenthalerroger · 2023-07-06T12:45:55Z

Based on the discussions in #911 I've written up a proposal for extending grafana-operator with support for unified alerting. I'm looking forward to the feedback.

If anyone wants to assist feel free to ask for access to the branch so we can streamline the work.

Ref: #911

NissesSenap · 2023-07-09T18:06:54Z

So this is a good start @siegenthalerroger and thanks for taking this on.
But this document doesn't highlight how we should use the different resource to have them talk to each other, which is the hard part of this API.

When creating a folder we generate the UID from the k8s resources today. The dashboard is excluded from this where we allow the UID being hardcoded in the dashboard json but if it isn't, we will generate a UID for you.

When adding a alertRule we need to point to the folder UID. How should we manage this?
Should it be a label selector? Should it be the folder name and we just search for it.

I don't have time to look deeper at the API, but I assume there are a number of other resources that needs to do call on each other using UID. How should we find them?

How should we make this valid in a multi grafana instance setup with multiple namespaces?

We need to have CRD examples for all the resources and how we should use them together before being able to give more feedback on this design.

owenhaynes · 2023-08-03T09:42:02Z

Would this also include the loki/mimir rules support? Or just the general grafana rules

NissesSenap · 2023-08-08T11:36:44Z

@owenhaynes this would only focus on grafana alerts. Loki/mimir rules isn't part of the grafana API.
I would recommend looking in to https://github.com/AmiditeX/mimir-operator, I have been in contact with the creator and we are looking in to using it where I currently work.

hubeadmin · 2023-08-09T11:19:51Z

I'm +1 for the idea of adding this support, we're at the point now where we can start extending the feature set on the v5 code base.

Questions/opinions:

Are you thinking of this support as a new grafanaalerts CRD? Or a finer set of per-api object scoped CRDs?
Personally, I'd prefer if we keep the number of CRDs to a minimum, and group them as much as possible.
I say this, for organizational reasons, I think we should aim to keep the CRD set as an easy to maintain and use (from a user perspective) collection.
This, IMHO, means grouping API objects into logical CRD's based on an API or Feature, but with a few caveats:

The CRD should not be an antipattern to a "classical" usage of the Grafana feature. i.e. We should allow users to pretty much "drag and drop" their resources from a non-operator managed deployment, into an operator managed one.

The CRD should not be overly dependent on any other CRDs as much as possible, to avoid complex relationship graphs between resources. (i'm mostly echoing what Edvin stated above)

I think if you could update this PR to include some higher-level graphs, or descriptions of relationships of these resources, we could give some more feedback.

Again we're +1 to the idea, but we just need to see how this would be structured as a CRD, That will allow us to come up with an implementation that will be in-line with v5 standards

NissesSenap · 2023-08-18T09:42:55Z

I'm currently playing around with alerts in grafana (not for this issue, though) and I hit an interesting thing that we should be aware of.

If you want to be able to edit the alerts through the UI after you update/create them using the API you need to define a header to disable

		HTTPHeaders: map[string]string{
			"X-Disable-Provenance": "true",
		},

https://grafana.com/docs/grafana-cloud/alerting-and-irm/alerting/set-up/provision-alerting-resources/view-provisioned-resources/

A funny thing is that if you ever update these resources without setting this header, they can't be changed afterward.

2023/08/18 11:24:11 status: 500, body: {"message":"cannot change provenance from 'api' to ''","traceID":""}
exit status 1

For me, I had to solve this by restoring DB.

Something to take in to consideration when doing stuff with alerts using the API.

github-actions · 2023-09-18T01:40:18Z

This PR hasn't been updated for a while, marking as stale

This picks up #911 and #1144. The proposal contains three different options for realizing alerting support in the operator and should serve as a base for discussion regarding this topic.

initial proposal for unified alerting

105a6dd

github-actions bot added the stale label Sep 18, 2023

github-actions bot closed this Sep 25, 2023

theSuess mentioned this pull request Dec 13, 2023

docs: add proposal for grafana alerting #1349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Unified Alerting support proposal #1144

Draft: Unified Alerting support proposal #1144

siegenthalerroger commented Jul 6, 2023

NissesSenap commented Jul 9, 2023

owenhaynes commented Aug 3, 2023

NissesSenap commented Aug 8, 2023

hubeadmin commented Aug 9, 2023

NissesSenap commented Aug 18, 2023

github-actions bot commented Sep 18, 2023

Draft: Unified Alerting support proposal #1144

Draft: Unified Alerting support proposal #1144

Conversation

siegenthalerroger commented Jul 6, 2023

NissesSenap commented Jul 9, 2023

owenhaynes commented Aug 3, 2023

NissesSenap commented Aug 8, 2023

hubeadmin commented Aug 9, 2023

NissesSenap commented Aug 18, 2023

github-actions bot commented Sep 18, 2023