Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Unified Alerting support proposal #1144

Conversation

siegenthalerroger
Copy link
Contributor

Based on the discussions in #911 I've written up a proposal for extending grafana-operator with support for unified alerting. I'm looking forward to the feedback.

If anyone wants to assist feel free to ask for access to the branch so we can streamline the work.

Ref: #911

@NissesSenap
Copy link
Collaborator

So this is a good start @siegenthalerroger and thanks for taking this on.
But this document doesn't highlight how we should use the different resource to have them talk to each other, which is the hard part of this API.

When creating a folder we generate the UID from the k8s resources today. The dashboard is excluded from this where we allow the UID being hardcoded in the dashboard json but if it isn't, we will generate a UID for you.

When adding a alertRule we need to point to the folder UID. How should we manage this?
Should it be a label selector? Should it be the folder name and we just search for it.

I don't have time to look deeper at the API, but I assume there are a number of other resources that needs to do call on each other using UID. How should we find them?

How should we make this valid in a multi grafana instance setup with multiple namespaces?

We need to have CRD examples for all the resources and how we should use them together before being able to give more feedback on this design.

@owenhaynes
Copy link

Would this also include the loki/mimir rules support? Or just the general grafana rules

@NissesSenap
Copy link
Collaborator

@owenhaynes this would only focus on grafana alerts. Loki/mimir rules isn't part of the grafana API.
I would recommend looking in to https://github.com/AmiditeX/mimir-operator, I have been in contact with the creator and we are looking in to using it where I currently work.

@hubeadmin
Copy link
Collaborator

I'm +1 for the idea of adding this support, we're at the point now where we can start extending the feature set on the v5 code base.

Questions/opinions:

Are you thinking of this support as a new grafanaalerts CRD? Or a finer set of per-api object scoped CRDs?
Personally, I'd prefer if we keep the number of CRDs to a minimum, and group them as much as possible.
I say this, for organizational reasons, I think we should aim to keep the CRD set as an easy to maintain and use (from a user perspective) collection.
This, IMHO, means grouping API objects into logical CRD's based on an API or Feature, but with a few caveats:

The CRD should not be an antipattern to a "classical" usage of the Grafana feature. i.e. We should allow users to pretty much "drag and drop" their resources from a non-operator managed deployment, into an operator managed one.

The CRD should not be overly dependent on any other CRDs as much as possible, to avoid complex relationship graphs between resources. (i'm mostly echoing what Edvin stated above)

I think if you could update this PR to include some higher-level graphs, or descriptions of relationships of these resources, we could give some more feedback.

Again we're +1 to the idea, but we just need to see how this would be structured as a CRD, That will allow us to come up with an implementation that will be in-line with v5 standards

@NissesSenap
Copy link
Collaborator

I'm currently playing around with alerts in grafana (not for this issue, though) and I hit an interesting thing that we should be aware of.

If you want to be able to edit the alerts through the UI after you update/create them using the API you need to define a header to disable

		HTTPHeaders: map[string]string{
			"X-Disable-Provenance": "true",
		},

https://grafana.com/docs/grafana-cloud/alerting-and-irm/alerting/set-up/provision-alerting-resources/view-provisioned-resources/

A funny thing is that if you ever update these resources without setting this header, they can't be changed afterward.

2023/08/18 11:24:11 status: 500, body: {"message":"cannot change provenance from 'api' to ''","traceID":""}
exit status 1

For me, I had to solve this by restoring DB.

Something to take in to consideration when doing stuff with alerts using the API.

@github-actions
Copy link

This PR hasn't been updated for a while, marking as stale

@github-actions github-actions bot added the stale label Sep 18, 2023
@github-actions github-actions bot closed this Sep 25, 2023
theSuess added a commit that referenced this pull request Dec 13, 2023
This picks up #911 and #1144. The proposal contains three different options for
realizing alerting support in the operator and should serve as a base for
discussion regarding this topic.
theSuess added a commit that referenced this pull request Dec 13, 2023
This picks up #911 and #1144. The proposal contains three different options for
realizing alerting support in the operator and should serve as a base for
discussion regarding this topic.
theSuess added a commit that referenced this pull request Dec 20, 2023
This picks up #911 and #1144. The proposal contains three different options for
realizing alerting support in the operator and should serve as a base for
discussion regarding this topic.
theSuess added a commit that referenced this pull request Jan 9, 2024
This picks up #911 and #1144. The proposal contains three different options for
realizing alerting support in the operator and should serve as a base for
discussion regarding this topic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants