Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick pull request #450 from RiRa12621/master #466

Merged
merged 1 commit into from
Jul 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,16 @@ $ jsonnet -J vendor -m files/dashboards -e '(import "mixin.libsonnet").grafanaDa

## Background

### Alert Severities
While the community has not yet fully agreed on alert severities and their to be used, this repository assumes the following paradigms when setting the severities:

* Critical: An issue, that needs to page a person to take instant action
* Warning: An issue, that needs to be worked on but in the regular work queue or for during office hours rather than paging the oncall
* Info: Is meant to support a trouble shooting process by informing about a non-normal situation for one or more systems but not worth a page or ticket on its own.


### Architecture and Technical Decisions

* For more motivation, see
"[The RED Method: How to instrument your services](https://kccncna17.sched.com/event/CU8K/the-red-method-how-to-instrument-your-services-b-tom-wilkie-kausal?iframe=no&w=100%&sidebar=yes&bg=no)" talk from CloudNativeCon Austin.
* For more information about monitoring mixins, see this [design doc](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/edit#).
6 changes: 3 additions & 3 deletions alerts/resource_alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,16 @@
'for': '5m',
},
{
alert: 'KubeQuotaExceeded',
alert: 'KubeQuotaFullyUsed',
expr: |||
kube_resourcequota{%(prefixedNamespaceSelector)s%(kubeStateMetricsSelector)s, type="used"}
/ ignoring(instance, job, type)
(kube_resourcequota{%(prefixedNamespaceSelector)s%(kubeStateMetricsSelector)s, type="hard"} > 0)
> 0.90
>= 1
||| % $._config,
'for': '15m',
labels: {
severity: 'warning',
severity: 'info',
},
annotations: {
message: 'Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.',
Expand Down
4 changes: 2 additions & 2 deletions runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ This page collects this repositories alerts and begins the process of describing
##### Alert Name: "KubeMemOvercommit"
+ *Message*: `Overcommited Memory resource request quota on Namespaces.`
+ *Severity*: warning
##### Alert Name: "KubeQuotaExceeded"
##### Alert Name: "KubeQuotaFullyUsed"
+ *Message*: `{{ $value | humanizePercentage }} usage of {{ $labels.resource }} in namespace {{ $labels.namespace }}.`
+ *Severity*: warning
+ *Severity*: info
### Group Name: "kubernetes-storage"
##### Alert Name: "KubePersistentVolumeUsageCritical"
+ *Message*: `The persistent volume claimed by {{ $labels.persistentvolumeclaim }} in namespace {{ $labels.namespace }} has {{ $value | humanizePercentage }} free.`
Expand Down