Skip to content

Commit

Permalink
Grafana OnCall
Browse files Browse the repository at this point in the history
Signed-off-by: Andrei Kvapil <[email protected]>
  • Loading branch information
kvaps committed Sep 6, 2024
1 parent b40e1b0 commit 4cac219
Show file tree
Hide file tree
Showing 29 changed files with 7,291 additions and 275 deletions.
28 changes: 26 additions & 2 deletions packages/extra/monitoring/templates/grafana/grafana.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,15 @@ spec:
mountPath: /var/lib/grafana
containers:
- name: grafana
image: grafana/grafana:10.1.0
image: grafana/grafana:11.2.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
readinessProbe:
failureThreshold: 3
env:
- name: GF_INSTALL_PLUGINS
value: grafana-worldmap-panel,flant-statusmap-panel,grafana-oncall-app,natel-discrete-panel
value: grafana-worldmap-panel,flant-statusmap-panel,grafana-oncall-app,natel-discrete-panel,grafana-oncall-app
- name: ONCALL_API_URL
value: http://grafana-oncall-engine:8080
- name: GF_DATABASE_HOST
Expand All @@ -87,6 +87,13 @@ spec:
secretKeyRef:
key: password
name: grafana-admin-password
volumeMounts:
- name: grafana-plugins
mountPath: /usr/share/grafana/conf/provisioning/plugins/
volumes:
- name: grafana-plugins
configMap:
name: grafana-plugins-provisioning
ingress:
metadata:
annotations:
Expand All @@ -109,3 +116,20 @@ spec:
- hosts:
- "{{ .Values.host | default (printf "grafana.%s" $host) }}"
secretName: grafana-ingress-tls
---
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-plugins-provisioning
data:
on-call.yaml: |
apiVersion: 1
apps:
- type: grafana-oncall-app
name: grafana-oncall-app
version: v1.9.0
disabled: false
jsonData:
grafanaUrl: "https://grafana.infra.aenix.org"
license: "OpenSource"
onCallApiUrl: "http://grafana-oncall-engine:8080"
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ spec:
oncall:
fullnameOverride: grafana-oncall
externalGrafana:
url: "https://{{ .Values.host | default (printf "grafana.%s" $host) }}/"
url: "http://grafana-service:3000"

externalPostgresql:
host: grafana-oncall-db-rw
Expand All @@ -35,6 +35,6 @@ spec:

externalRedis:
host: rfrm-grafana-oncall
existingSecret: {{ .Release.Name }}-oncall-redis-password
existingSecret: grafana-oncall-redis-password
passwordKey: password
{{- end }}
2 changes: 1 addition & 1 deletion packages/extra/monitoring/templates/vm/vmalertmanager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ stringData:
receivers:
- name: 'webhook'
webhook_configs:
- url: http://{{ .Release.Name }}-oncall-engine.{{ .Release.Namespace }}.svc:8080/integrations/v1/alertmanager/Kjb2NWxxSlgGtxz9F4ihovQBB/
- url: http://grafana-oncall-engine:8080/integrations/v1/alertmanager/fD8cZuXGPvDyQSNYbUwJgHB6H/
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanager
Expand Down
2 changes: 1 addition & 1 deletion packages/system/grafana-oncall/charts/oncall/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ dependencies:
repository: https://prometheus-community.github.io/helm-charts
version: 25.8.2
digest: sha256:edc9fef449a694cd319135e37ac84f8247ac9ad0c48ac86099dae4e428beb7b7
generated: "2024-01-26T17:54:48.132209769Z"
generated: "2024-09-04T18:52:49.709787897Z"
4 changes: 2 additions & 2 deletions packages/system/grafana-oncall/charts/oncall/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
apiVersion: v2
appVersion: v1.3.94
appVersion: v1.9.22
dependencies:
- condition: cert-manager.enabled
name: cert-manager
Expand Down Expand Up @@ -36,4 +36,4 @@ dependencies:
description: Developer-friendly incident response with brilliant Slack integration
name: oncall
type: application
version: 1.3.94
version: 1.9.22
13 changes: 10 additions & 3 deletions packages/system/grafana-oncall/charts/oncall/templates/_env.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@
- name: FEATURE_SLACK_INTEGRATION_ENABLED
value: {{ .Values.oncall.slack.enabled | toString | title | quote }}
{{- if .Values.oncall.slack.enabled }}
- name: SLACK_SLASH_COMMAND_NAME
value: "/{{ .Values.oncall.slack.commandName | default "oncall" }}"
{{- if .Values.oncall.slack.existingSecret }}
- name: SLACK_CLIENT_OAUTH_ID
valueFrom:
Expand Down Expand Up @@ -603,6 +601,13 @@ when broker.type != rabbitmq, we do not need to include rabbitmq environment var
{{- end }}

{{- define "snippet.oncall.smtp.env" -}}
{{- $smtpTLS:=.Values.oncall.smtp.tls | default true | toString | title | quote }}
{{- $smtpSSL:=.Values.oncall.smtp.ssl | default false | toString | title | quote }}
{{- if eq $smtpTLS "\"True\"" }}
{{- if eq $smtpSSL "\"True\"" }}
{{- fail "cannot set Email (SMTP) to use SSL and TLS at the same time" }}
{{- end }}
{{- end }}
- name: FEATURE_EMAIL_INTEGRATION_ENABLED
value: {{ .Values.oncall.smtp.enabled | toString | title | quote }}
{{- if .Values.oncall.smtp.enabled }}
Expand All @@ -619,7 +624,9 @@ when broker.type != rabbitmq, we do not need to include rabbitmq environment var
key: smtp-password
optional: true
- name: EMAIL_USE_TLS
value: {{ .Values.oncall.smtp.tls | default true | toString | title | quote }}
value: {{ $smtpTLS }}
- name: EMAIL_USE_SSL
value: {{ $smtpSSL }}
- name: EMAIL_FROM_ADDRESS
value: {{ .Values.oncall.smtp.fromEmail | quote }}
- name: EMAIL_NOTIFICATIONS_LIMIT
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ metadata:
{{- end }}
type: Opaque
data:
{{ include "snippet.oncall.secret.secretKey" . }}: {{ randAlphaNum 40 | b64enc | quote }}
{{ include "snippet.oncall.secret.mirageSecretKey" . }}: {{ randAlphaNum 40 | b64enc | quote }}
{{ include "snippet.oncall.secret.secretKey" . }}: {{ (.Values.oncall.secrets.secretKey | default (randAlphaNum 40)) | b64enc | quote }}
{{ include "snippet.oncall.secret.mirageSecretKey" . }}: {{ (.Values.oncall.secrets.mirageSecretKey | default (randAlphaNum 40)) | b64enc | quote }}
---
{{- end }}
{{- if and (eq .Values.database.type "mysql") (not .Values.mariadb.enabled) (not .Values.externalMysql.existingSecret) }}
Expand Down Expand Up @@ -46,7 +46,7 @@ data:
postgres-password: {{ required "externalPostgresql.password is required if not postgresql.enabled and not externalPostgresql.existingSecret" .Values.externalPostgresql.password | b64enc | quote }}
---
{{- end }}
{{- if and (eq .Values.broker.type "rabbitmq") (not .Values.rabbitmq.enabled) (not .Values.externalRabbitmq.existingSecret) }}
{{- if and (eq .Values.broker.type "rabbitmq") (.Values.externalRabbitmq.password) (not .Values.rabbitmq.enabled) (not .Values.externalRabbitmq.existingSecret) }}
apiVersion: v1
kind: Secret
metadata:
Expand All @@ -61,7 +61,7 @@ data:
rabbitmq-password: {{ required "externalRabbitmq.password is required if not rabbitmq.enabled and not externalRabbitmq.existingSecret" .Values.externalRabbitmq.password | b64enc | quote }}
---
{{- end }}
{{- if and (eq .Values.broker.type "redis") (not .Values.redis.enabled) (not .Values.externalRedis.existingSecret) }}
{{- if and (.Values.externalRedis.host) (not .Values.redis.enabled) (not .Values.externalRedis.existingSecret) }}
apiVersion: v1
kind: Secret
metadata:
Expand Down
5 changes: 2 additions & 3 deletions packages/system/grafana-oncall/charts/oncall/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ detached_integrations:
# Celery workers pods configuration
celery:
replicaCount: 1
worker_queue: "default,critical,long,slack,telegram,webhook,celery,grafana"
worker_queue: "default,critical,long,slack,telegram,webhook,celery,grafana,retry"
worker_concurrency: "1"
worker_max_tasks_per_child: "100"
worker_beat_enabled: "True"
Expand Down Expand Up @@ -305,8 +305,6 @@ oncall:
slack:
# Enable the Slack ChatOps integration for the Oncall Engine.
enabled: false
# Sets the Slack bot slash-command
commandName: oncall
# clientId configures the Slack app OAuth2 client ID.
# api.slack.com/apps/<yourApp> -> Basic Information -> App Credentials -> Client ID
clientId: ~
Expand Down Expand Up @@ -343,6 +341,7 @@ oncall:
username: ~
password: ~
tls: ~
ssl: ~
fromEmail: ~
exporter:
enabled: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.2
version: 0.3.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "v5.6.0"
appVersion: "v5.12.0"
50 changes: 46 additions & 4 deletions packages/system/grafana-operator/charts/grafana-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,45 @@ linkTitle: "Helm installation"

[grafana-operator](https://github.com/grafana/grafana-operator) for Kubernetes to manage Grafana instances and grafana resources.

![Version: 0.1.2](https://img.shields.io/badge/Version-0.1.2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v5.6.0](https://img.shields.io/badge/AppVersion-v5.6.0-informational?style=flat-square)
![Version: 0.3.0](https://img.shields.io/badge/Version-0.3.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v5.12.0](https://img.shields.io/badge/AppVersion-v5.12.0-informational?style=flat-square)

## Installation

This is a OCI helm chart, helm started support OCI in version 3.8.0.

```shell
helm upgrade -i grafana-operator oci://ghcr.io/grafana/helm-charts/grafana-operator --version v5.6.0
helm upgrade -i grafana-operator oci://ghcr.io/grafana/helm-charts/grafana-operator --version v5.12.0
```

Sadly helm OCI charts currently don't support searching for available versions of a helm [oci registry](https://github.com/helm/helm/issues/11000).

### Using Terraform

To install the helm chart using terraform, make sure you use the right values for `repository` and `name` as shown below:

```hcl
resource "helm_release" "grafana_kubernetes_operator" {
name = "grafana-operator"
namespace = "default"
repository = "oci://ghcr.io/grafana/helm-charts"
chart = "grafana-operator"
verify = false
version = "v5.12.0"
}
```

## Upgrading

Helm does not provide functionality to update custom resource definitions. This can result in the operator misbehaving when a release contains updates to the custom resource definitions.
To avoid issues due to outdated or missing definitions, run the following command before updating an existing installation:

```shell
kubectl apply --server-side --force-conflicts -f https://github.com/grafana/grafana-operator/releases/download/v5.12.0/crds.yaml
```

The `--server-side` and `--force-conflict` flags are required to avoid running into issues with the `kubectl.kubernetes.io/last-applied-configuration` annotation.
By using server side apply, this annotation is not considered. `--force-conflict` allows kubectl to modify fields previously managed by helm.

## Development

For general and helm specific development instructions please read the [CONTRIBUTING.md](../../../CONTRIBUTING.md)
Expand All @@ -38,24 +65,39 @@ It's easier to just manage this configuration outside of the operator.
| additionalLabels | object | `{}` | additional labels to add to all resources |
| affinity | object | `{}` | pod affinity |
| env | list | `[]` | Additional environment variables |
| fullnameOverride | string | `""` | |
| extraObjects | list | `[]` | Array of extra K8s objects to deploy |
| fullnameOverride | string | `""` | Overrides the fully qualified app name. |
| image.pullPolicy | string | `"IfNotPresent"` | The image pull policy to use in grafana operator container |
| image.repository | string | `"ghcr.io/grafana/grafana-operator"` | grafana operator image repository |
| image.tag | string | `""` | Overrides the image tag whose default is the chart appVersion. |
| imagePullSecrets | list | `[]` | image pull secrets |
| isOpenShift | bool | `false` | Determines if the target cluster is OpenShift. Additional rbac permissions for routes will be added on OpenShift |
| leaderElect | bool | `false` | If you want to run multiple replicas of the grafana-operator, this is not recommended. |
| metricsService.metricsPort | int | `9090` | metrics service port |
| metricsService.pprofPort | int | `8888` | port for the pprof profiling endpoint |
| metricsService.type | string | `"ClusterIP"` | metrics service type |
| nameOverride | string | `""` | |
| nameOverride | string | `""` | Overrides the name of the chart. |
| namespaceOverride | string | `""` | Overrides the namespace name. |
| namespaceScope | bool | `false` | If the operator should run in namespace-scope or not, if true the operator will only be able to manage instances in the same namespace |
| nodeSelector | object | `{}` | pod node selector |
| podAnnotations | object | `{}` | pod annotations |
| podSecurityContext | object | `{}` | pod security context |
| priorityClassName | string | `""` | pod priority class name |
| rbac.create | bool | `true` | Specifies whether to create the ClusterRole and ClusterRoleBinding. If "namespaceScope" is true or "watchNamespaces" is set, this will create Role and RoleBinding instead. |
| resources | object | `{}` | grafana operator container resources |
| securityContext | object | `{"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsNonRoot":true}` | grafana operator container security context |
| serviceAccount.annotations | object | `{}` | Annotations to add to the service account |
| serviceAccount.create | bool | `true` | Specifies whether a service account should be created |
| serviceAccount.name | string | `""` | The name of the service account to use. If not set and create is true, a name is generated using the fullname template |
| serviceMonitor | object | `{"additionalLabels":{},"enabled":false,"interval":"1m","metricRelabelings":[],"relabelings":[],"scrapeTimeout":"10s","targetLabels":[],"telemetryPath":"/metrics"}` | Enable this to use with Prometheus Operator |
| serviceMonitor.additionalLabels | object | `{}` | Set of labels to transfer from the Kubernetes Service onto the target |
| serviceMonitor.enabled | bool | `false` | When set true then use a ServiceMonitor to configure scraping |
| serviceMonitor.interval | string | `"1m"` | Set how frequently Prometheus should scrape |
| serviceMonitor.metricRelabelings | list | `[]` | MetricRelabelConfigs to apply to samples before ingestion |
| serviceMonitor.relabelings | list | `[]` | Set relabel_configs as per https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config |
| serviceMonitor.scrapeTimeout | string | `"10s"` | Set timeout for scrape |
| serviceMonitor.targetLabels | list | `[]` | Set of labels to transfer from the Kubernetes Service onto the target |
| serviceMonitor.telemetryPath | string | `"/metrics"` | Set path to metrics path |
| tolerations | list | `[]` | pod tolerations |
| watchNamespaceSelector | string | `""` | Sets the WATCH_NAMESPACE_SELECTOR environment variable, it defines which namespaces the operator should be listening for based on label and key value pair added on namespace kind. By default it's all namespaces. |
| watchNamespaces | string | `""` | Sets the WATCH_NAMESPACE environment variable, it defines which namespaces the operator should be listening for. By default it's all namespaces, if you only want to listen for the same namespace as the operator is deployed to look at namespaceScope. |
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,34 @@ helm upgrade -i grafana-operator oci://ghcr.io/grafana/helm-charts/grafana-opera

Sadly helm OCI charts currently don't support searching for available versions of a helm [oci registry](https://github.com/helm/helm/issues/11000).

### Using Terraform

To install the helm chart using terraform, make sure you use the right values for `repository` and `name` as shown below:

```hcl
resource "helm_release" "grafana_kubernetes_operator" {
name = "grafana-operator"
namespace = "default"
repository = "oci://ghcr.io/grafana/helm-charts"
chart = "grafana-operator"
verify = false
version = "{{ template "chart.appVersion" . }}"
}
```


## Upgrading

Helm does not provide functionality to update custom resource definitions. This can result in the operator misbehaving when a release contains updates to the custom resource definitions.
To avoid issues due to outdated or missing definitions, run the following command before updating an existing installation:

```shell
kubectl apply --server-side --force-conflicts -f https://github.com/grafana/grafana-operator/releases/download/{{ template "chart.appVersion" . }}/crds.yaml
```

The `--server-side` and `--force-conflict` flags are required to avoid running into issues with the `kubectl.kubernetes.io/last-applied-configuration` annotation.
By using server side apply, this annotation is not considered. `--force-conflict` allows kubectl to modify fields previously managed by helm.

## Development

For general and helm specific development instructions please read the [CONTRIBUTING.md](../../../CONTRIBUTING.md)
Expand Down
Loading

0 comments on commit 4cac219

Please sign in to comment.