diff --git a/README.md b/README.md
index b10e4a22..c0c78bed 100644
--- a/README.md
+++ b/README.md
@@ -5,111 +5,213 @@
[![PyPI version](https://badge.fury.io/py/slo-generator.svg)](https://badge.fury.io/py/slo-generator)
`slo-generator` is a tool to compute and export **Service Level Objectives** ([SLOs](https://landing.google.com/sre/sre-book/chapters/service-level-objectives/)),
-**Error Budgets** and **Burn Rates**, using policies written in JSON or YAML format.
+**Error Budgets** and **Burn Rates**, using configurations written in YAML (or JSON) format.
+
+## Table of contents
+- [Description](#description)
+- [Local usage](#local-usage)
+ - [Requirements](#requirements)
+ - [Installation](#installation)
+ - [CLI usage](#cli-usage)
+ - [API usage](#api-usage)
+- [Configuration](#configuration)
+ - [SLO configuration](#slo-configuration)
+ - [Shared configuration](#shared-configuration)
+- [More documentation](#more-documentation)
+ - [Build an SLO achievements report with BigQuery and DataStudio](#build-an-slo-achievements-report-with-bigquery-and-datastudio)
+ - [Deploy the SLO Generator in Cloud Run](#deploy-the-slo-generator-in-cloud-run)
+ - [Deploy the SLO Generator in Kubernetes (Alpha)](#deploy-the-slo-generator-in-kubernetes-alpha)
+ - [Deploy the SLO Generator in a CloudBuild pipeline](#deploy-the-slo-generator-in-a-cloudbuild-pipeline)
+ - [DEPRECATED: Deploy the SLO Generator on Google Cloud Functions (Terraform)](#deprecated-deploy-the-slo-generator-on-google-cloud-functions-terraform)
+ - [Contribute to the SLO Generator](#contribute-to-the-slo-generator)
## Description
-`slo-generator` will query metrics backend and compute the following metrics:
+The `slo-generator` runs backend queries computing **Service Level Indicators**,
+compares them with the **Service Level Objectives** defined and generates a report
+by computing important metrics:
-* **Service Level Objective** defined as `SLO (%) = GOOD_EVENTS / VALID_EVENTS`
-* **Error Budget** defined as `ERROR_BUDGET = 100 - SLO (%)`
-* **Burn Rate** defined as `BURN_RATE = ERROR_BUDGET / ERROR_BUDGET_TARGET`
+* **Service Level Indicator** (SLI) defined as **SLI = Ngood_events / Nvalid_events**
+* **Error Budget** (EB) defined as **EB = 1 - SLI**
+* **Error Budget Burn Rate** (EBBR) defined as **EBBR = EB / EBtarget**
+* **... and more**, see the [example SLO report](./test/unit/../../tests/unit/fixtures/slo_report_v2.json).
+
+The **Error Budget Burn Rate** is often used for [**alerting on SLOs**](https://sre.google/workbook/alerting-on-slos/), as it demonstrates in practice to be more **reliable** and **stable** than
+alerting directly on metrics or on **SLI > SLO** thresholds.
## Local usage
-**Requirements**
+### Requirements
-* Python 3
+* `python3.7` and above
+* `pip3`
-**Installation**
+### Installation
-`slo-generator` is published on PyPI. To install it, run:
+`slo-generator` is a Python library published on [PyPI](https://pypi.org). To install it, run:
```sh
pip3 install slo-generator
```
-**Run the `slo-generator`**
-
-```
-slo-generator -f -b --export
-```
- * `` is the [SLO config](#slo-configuration) file or folder.
- If a folder path is passed, the SLO configs filenames should match the pattern `slo_*.yaml` to be loaded.
-
- * `` is the [Error Budget Policy](#error-budget-policy) file.
-
- * `--export` enables exporting data using the `exporters` defined in the SLO
- configuration file.
-
-Use `slo-generator --help` to list all available arguments.
-
***Notes:***
+* To install **[providers](./docs/providers)**, use `pip3 install slo-generator[, , ... -c --export
+```
+where:
+ * `` is the [SLO configuration](#slo-configuration) file or folder path.
-* **SLO metadata**:
- * `slo_name`: Name of this SLO.
- * `slo_description`: Description of this SLO.
- * `slo_target`: SLO target (between 0 and 1).
- * `service_name`: Name of the monitored service.
- * `feature_name`: Name of the monitored subsystem.
- * `metadata`: Dict of user metadata.
+ * `` is the [Shared configuration](#shared-configuration) file path.
+ * `--export` | `-e` enables exporting data using the `exporters` specified in the SLO
+ configuration file.
-* **SLI configuration**:
- * `backend`: Specific documentation and examples are available for each supported backends:
- * [Stackdriver Monitoring](docs/providers/stackdriver.md#backend)
- * [Stackdriver Service Monitoring](docs/providers/stackdriver_service_monitoring.md#backend)
- * [Prometheus](docs/providers/prometheus.md#backend)
- * [ElasticSearch](docs/providers/elasticsearch.md#backend)
- * [Datadog](docs/providers/datadog.md#backend)
- * [Dynatrace](docs/providers/dynatrace.md#backend)
- * [Custom](docs/providers/custom.md#backend)
+Use `slo-generator compute --help` to list all available arguments.
-- **Exporter configuration**:
- * `exporters`: A list of exporters to export results to. Specific documentation is available for each supported exporters:
- * [Cloud Pub/Sub](docs/providers/pubsub.md#exporter) to stream SLO reports.
- * [BigQuery](docs/providers/bigquery.md#exporter) to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
- * [Stackdriver Monitoring](docs/providers/stackdriver.md#exporter) to export metrics to Stackdriver Monitoring.
- * [Prometheus](docs/providers/prometheus.md#exporter) to export metrics to Prometheus.
- * [Datadog](docs/providers/datadog.md#exporter) to export metrics to Datadog.
- * [Dynatrace](docs/providers/dynatrace.md#exporter) to export metrics to Dynatrace.
- * [Custom](docs/providers/custom.md#exporter) to export SLO data or metrics to a custom destination.
+### API usage
-***Note:*** *you can use environment variables in your SLO configs by using `${MY_ENV_VAR}` syntax to avoid having sensitive data in version control. Environment variables will be replaced at run time.*
+On top of the CLI, the `slo-generator` can also be run as an API using the Cloud
+Functions Framework SDK (Flask):
+```
+slo-generator api -c
+```
+where:
+ * `` is the [Shared configuration](#shared-configuration) file path or GCS URL.
-==> An example SLO configuration file is available [here](samples/stackdriver/slo_gae_app_availability.yaml).
+Once the API is up-and-running, you can `HTTP POST` SLO configurations to it.
-#### Error Budget policy
+***Notes:***
+* The API responds by default to HTTP requests. An alternative mode is to
+respond to [`CloudEvents`](https://cloudevents.io/) instead, by setting
+`--signature-type cloudevent`.
-The **Error Budget policy** (JSON or YAML) is a list of multiple error budgets, each one composed of the following fields:
+* Use `--target export` to run the API in export mode only (former `slo-pipeline`).
-* `window`: Rolling time window for this error budget.
-* `alerting_burn_rate_threshold`: Target burnrate threshold over which alerting is needed.
-* `urgent_notification`: boolean whether violating this error budget should trigger a page.
-* `overburned_consequence_message`: message to show when the error budget is above the target.
-* `achieved_consequence_message`: message to show when the error budget is within the target.
+## Configuration
-==> An example Error Budget policy is available [here](samples/error_budget_policy.yaml).
+The `slo-generator` requires two configuration files to run, an **SLO configuration**
+file, describing your SLO, and the **Shared configuration** file (common
+configuration for all SLOs).
+
+### SLO configuration
+
+The **SLO configuration** (JSON or YAML) is following the Kubernetes format and
+is composed of the following fields:
+
+* `api`: `sre.google.com/v2`
+* `kind`: `ServiceLevelObjective`
+* `metadata`:
+ * `name`: [**required**] *string* - Full SLO name (**MUST** be unique).
+ * `labels`: [*optional*] *map* - Metadata labels, **for example**:
+ * `slo_name`: SLO name (e.g `availability`, `latency128ms`, ...).
+ * `service_name`: Monitored service (to group SLOs by service).
+ * `feature_name`: Monitored feature (to group SLOs by feature).
+
+* `spec`:
+ * `description`: [**required**] *string* - Description of this SLO.
+ * `goal`: [**required**] *string* - SLO goal (or target) (**MUST** be between 0 and 1).
+ * `backend`: [**required**] *string* - Backend name (**MUST** exist in SLO Generator Configuration).
+ * `service_level_indicator`: [**required**] *map* - SLI configuration. The content of this section is
+ specific to each provider, see [`docs/providers`](./docs/providers).
+ * `error_budget_policy`: [*optional*] *string* - Error budget policy name
+ (**MUST** exist in SLO Generator Configuration). If not specified, defaults to `default`.
+ * `exporters`: [*optional*] *string* - List of exporter names (**MUST** exist in SLO Generator Configuration).
+
+***Note:*** *you can use environment variables in your SLO configs by using
+`${MY_ENV_VAR}` syntax to avoid having sensitive data in version control.
+Environment variables will be replaced automatically at run time.*
+
+**→ See [example SLO configuration](samples/cloud_monitoring/slo_gae_app_availability.yaml).**
+
+### Shared configuration
+The shared configuration (JSON or YAML) configures the `slo-generator` and acts
+as a shared config for all SLO configs. It is composed of the following fields:
+
+* `backends`: [**required**] *map* - Data backends configurations. Each backend
+ alias is defined as a key `/`, and a configuration map.
+ ```yaml
+ backends:
+ cloud_monitoring/dev:
+ project_id: proj-cm-dev-a4b7
+ datadog/test:
+ app_key: ${APP_SECRET_KEY}
+ api_key: ${API_SECRET_KEY}
+ ```
+ See specific providers documentation for detailed configuration:
+ * [`cloud_monitoring`](docs/providers/cloud_monitoring.md#backend)
+ * [`cloud_service_monitoring`](docs/providers/cloud_service_monitoring.md#backend)
+ * [`prometheus`](docs/providers/prometheus.md#backend)
+ * [`elasticsearch`](docs/providers/elasticsearch.md#backend)
+ * [`datadog`](docs/providers/datadog.md#backend)
+ * [`dynatrace`](docs/providers/dynatrace.md#backend)
+ * [``](docs/providers/custom.md#backend)
+
+* `exporters`: A map of exporters to export results to. Each exporter is defined
+ as a key formatted as `/`, and a map value detailing the
+ exporter configuration.
+ ```yaml
+ exporters:
+ bigquery/dev:
+ project_id: proj-bq-dev-a4b7
+ dataset_id: my-test-dataset
+ table_id: my-test-table
+ prometheus/test:
+ url: ${PROMETHEUS_URL}
+ ```
+ See specific providers documentation for detailed configuration:
+ * [`pubsub`](docs/providers/pubsub.md#exporter) to stream SLO reports.
+ * [`bigquery`](docs/providers/bigquery.md#exporter) to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
+ * [`cloud_monitoring`](docs/providers/cloud_monitoring.md#exporter) to export metrics to Cloud Monitoring.
+ * [`prometheus`](docs/providers/prometheus.md#exporter) to export metrics to Prometheus.
+ * [`datadog`](docs/providers/datadog.md#exporter) to export metrics to Datadog.
+ * [`dynatrace`](docs/providers/dynatrace.md#exporter) to export metrics to Dynatrace.
+ * [``](docs/providers/custom.md#exporter) to export SLO data or metrics to a custom destination.
+
+* `error_budget_policies`: [**required**] A map of various error budget policies.
+ * ``: Name of the error budget policy.
+ * `steps`: List of error budget policy steps, each containing the following fields:
+ * `window`: Rolling time window for this error budget.
+ * `alerting_burn_rate_threshold`: Target burnrate threshold over which alerting is needed.
+ * `urgent_notification`: boolean whether violating this error budget should trigger a page.
+ * `overburned_consequence_message`: message to show when the error budget is above the target.
+ * `achieved_consequence_message`: message to show when the error budget is within the target.
+
+ ```yaml
+ error_budget_policies:
+ default:
+ steps:
+ - name: 1 hour
+ burn_rate_threshold: 9
+ alert: true
+ message_alert: Page to defend the SLO
+ message_ok: Last hour on track
+ window: 3600
+ - name: 12 hours
+ burn_rate_threshold: 3
+ alert: true
+ message_alert: Page to defend the SLO
+ message_ok: Last 12 hours on track
+ window: 43200
+ ```
+
+**→ See [example Shared configuration](samples/config.yaml).**
## More documentation
To go further with the SLO Generator, you can read:
-* [Build an SLO achievements report with BigQuery and DataStudio](docs/deploy/datastudio_slo_report.md)
-
-* [Deploy the SLO Generator on Google Cloud Functions (Terraform)](docs/deploy/cloudfunctions.md)
-
-* [Deploy the SLO Generator on Kubernetes (Alpha)](docs/deploy/kubernetes.md)
-
-* [Deploy the SLO Generator in a CloudBuild pipeline](docs/deploy/cloudbuild.md)
-
-* [Contribute to the SLO Generator](CONTRIBUTING.md)
+### [Build an SLO achievements report with BigQuery and DataStudio](docs/deploy/datastudio_slo_report.md)
+### [Deploy the SLO Generator in Cloud Run](docs/deploy/cloudrun.md)
+### [Deploy the SLO Generator in Kubernetes (Alpha)](docs/deploy/kubernetes.md)
+### [Deploy the SLO Generator in a CloudBuild pipeline](docs/deploy/cloudbuild.md)
+### [DEPRECATED: Deploy the SLO Generator on Google Cloud Functions (Terraform)](docs/deploy/cloudfunctions.md)
+### [Contribute to the SLO Generator](CONTRIBUTING.md)
diff --git a/docs/deploy/cloudfunctions.md b/docs/deploy/cloudfunctions.md
index 301043d8..72c032c4 100644
--- a/docs/deploy/cloudfunctions.md
+++ b/docs/deploy/cloudfunctions.md
@@ -9,8 +9,8 @@
Other components can be added to make results available to other destinations:
-* A **Cloud Function** to export SLO reports (e.g: to BigQuery and Stackdriver Monitoring), running `slo-generator`.
-* A **Stackdriver Monitoring Policy** to alert on high budget Burn Rates.
+* A **Cloud Function** to export SLO reports (e.g: to BigQuery and Cloud Monitoring), running `slo-generator`.
+* A **Cloud Monitoring Policy** to alert on high budget Burn Rates.
Below is a diagram of what this pipeline looks like:
@@ -22,9 +22,9 @@ Below is a diagram of what this pipeline looks like:
* **Historical analytics** by analyzing SLO data in Bigquery.
-* **Real-time alerting** by setting up Stackdriver Monitoring alerts based on
+* **Real-time alerting** by setting up Cloud Monitoring alerts based on
wanted SLOs.
* **Real-time, daily, monthly, yearly dashboards** by streaming BigQuery SLO reports to DataStudio (see [here](datastudio_slo_report.md)) and building dashboards.
-An example of pipeline automation with Terraform can be found in the corresponding [Terraform module](https://github.com/terraform-google-modules/terraform-google-slo/tree/master/examples/simple_example).
+An example of pipeline automation with Terraform can be found in the corresponding [Terraform module](https://github.com/terraform-google-modules/terraform-google-slo/tree/master/examples/slo-generator/simple_example).
diff --git a/docs/providers/stackdriver.md b/docs/providers/cloud_monitoring.md
similarity index 52%
rename from docs/providers/stackdriver.md
rename to docs/providers/cloud_monitoring.md
index 2e107d6c..26054ad2 100644
--- a/docs/providers/stackdriver.md
+++ b/docs/providers/cloud_monitoring.md
@@ -1,16 +1,23 @@
-# Stackdriver Monitoring
+# Cloud Monitoring
## Backend
-Using the `Stackdriver` backend class, you can query any metrics available in
-Stackdriver Monitoring to create an SLO.
+Using the `cloud_monitoring` backend class, you can query any metrics available
+in `Cloud Monitoring` to create an SLO.
-The following methods are available to compute SLOs with the `Stackdriver`
+```yaml
+backends:
+ cloud_monitoring:
+ project_id: "${WORKSPACE_PROJECT_ID}"
+```
+
+The following methods are available to compute SLOs with the `cloud_monitoring`
backend:
* `good_bad_ratio` for metrics of type `DELTA`, `GAUGE`, or `CUMULATIVE`.
* `distribution_cut` for metrics of type `DELTA` and unit `DISTRIBUTION`.
+
### Good / bad ratio
The `good_bad_ratio` method is used to compute the ratio between two metrics:
@@ -23,84 +30,75 @@ SLO.
This method is often used for availability SLOs, but can be used for other
purposes as well (see examples).
-**Config example:**
+**SLO config blob:**
```yaml
-backend:
- class: Stackdriver
- project_id: "${STACKDRIVER_HOST_PROJECT_ID}"
- method: good_bad_ratio
- measurement:
- filter_good: >
- project="${GAE_PROJECT_ID}"
- metric.type="appengine.googleapis.com/http/server/response_count"
- metric.labels.response_code >= 200
- metric.labels.response_code < 500
- filter_valid: >
- project="${GAE_PROJECT_ID}"
- metric.type="appengine.googleapis.com/http/server/response_count"
+backend: cloud_monitoring
+method: good_bad_ratio
+service_level_indicator:
+ filter_good: >
+ project="${GAE_PROJECT_ID}"
+ metric.type="appengine.googleapis.com/http/server/response_count"
+ metric.labels.response_code >= 200
+ metric.labels.response_code < 500
+ filter_valid: >
+ project="${GAE_PROJECT_ID}"
+ metric.type="appengine.googleapis.com/http/server/response_count"
```
You can also use the `filter_bad` field which identifies bad events instead of
the `filter_valid` field which identifies all valid events.
-**→ [Full SLO config](../../samples/stackdriver/slo_gae_app_availability.yaml)**
+**→ [Full SLO config](../../samples/cloud_monitoring/slo_gae_app_availability.yaml)**
### Distribution cut
-The `distribution_cut` method is used for Stackdriver distribution-type metrics,
-which are usually used for latency metrics.
+The `distribution_cut` method is used for Cloud Monitoring distribution-type
+metrics, which are usually used for latency metrics.
A distribution metric records the **statistical distribution of the extracted
values** in **histogram buckets**. The extracted values are not recorded
individually, but their distribution across the configured buckets are recorded,
along with the `count`, `mean`, and `sum` of squared deviation of the values.
-In `Stackdriver Monitoring`, there are three different ways to specify bucket
+In Cloud Monitoring, there are three different ways to specify bucket
boundaries:
* **Linear:** Every bucket has the same width.
* **Exponential:** Bucket widths increases for higher values, using an
exponential growth factor.
* **Explicit:** Bucket boundaries are set for each bucket using a bounds array.
-**Config example:**
+**SLO config blob:**
```yaml
-backend:
- class: Stackdriver
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- method: exponential_distribution_cut
- measurement:
- filter_valid: >
- project=${GAE_PROJECT_ID} AND
- metric.type=appengine.googleapis.com/http/server/response_latencies AND
- metric.labels.response_code >= 200 AND
- metric.labels.response_code < 500
- good_below_threshold: true
- threshold_bucket: 19
+backend: cloud_monitoring
+method: exponential_distribution_cut
+service_level_indicator:
+ filter_valid: >
+ project=${GAE_PROJECT_ID} AND
+ metric.type=appengine.googleapis.com/http/server/response_latencies AND
+ metric.labels.response_code >= 200 AND
+ metric.labels.response_code < 500
+ good_below_threshold: true
+ threshold_bucket: 19
```
-**→ [Full SLO config](../../samples/stackdriver/slo_gae_app_latency.yaml)**
+**→ [Full SLO config](../../samples/cloud_monitoring/slo_gae_app_latency.yaml)**
The `threshold_bucket` number to reach our 724ms target latency will depend on
how the buckets boundaries are set. Learn how to [inspect your distribution metrics](https://cloud.google.com/logging/docs/logs-based-metrics/distribution-metrics#inspecting_distribution_metrics) to figure out the bucketization.
## Exporter
-The `Stackdriver` exporter allows to export SLO metrics to Cloud Monitoring API.
-
-**Example config:**
-
-The following configuration will create the custom metric
-`error_budget_burn_rate` in `Stackdriver Monitoring`:
+The `cloud_monitoring` exporter allows to export SLO metrics to Cloud Monitoring API.
```yaml
-exporters:
- - class: Stackdriver
- project_id: "${STACKDRIVER_HOST_PROJECT_ID}"
+backends:
+ cloud_monitoring:
+ project_id: "${WORKSPACE_PROJECT_ID}"
```
Optional fields:
- * `metrics`: List of metrics to export ([see docs](../shared/metrics.md)). Defaults to [`custom:error_budget_burn_rate`, `custom:sli_measurement`].
+ * `metrics`: [*optional*] `list` - List of metrics to export ([see docs](../shared/metrics.md)).
-**→ [Full SLO config](../../samples/stackdriver/slo_lb_request_availability.yaml)**
+**→ [Full SLO config](../../samples/cloud_monitoring/slo_lb_request_availability.yaml)**
## Alerting
@@ -109,6 +107,7 @@ being able to alert on them is simply useless.
**Too many alerts** can be daunting, and page your SRE engineers for no valid
reasons.
+
**Too little alerts** can mean that your applications are not monitored at all
(no application have 100% reliability).
@@ -117,24 +116,24 @@ reduce the noise and page only when it's needed.
**Example:**
-We will define a `Stackdriver Monitoring` alert that we will **filter out on the
+We will define a `Cloud Monitoring` alert that we will **filter out on the
corresponding error budget step**.
-Consider the following error budget policy config:
+Consider the following error budget policy step config:
```yaml
-- error_budget_policy_step_name: 1 hour
- measurement_window_seconds: 3600
- alerting_burn_rate_threshold: 9
- urgent_notification: true
- overburned_consequence_message: Page the SRE team to defend the SLO
- achieved_consequence_message: Last hour on track
+- name: 1 hour
+ window: 3600
+ burn_rate_threshold: 9
+ alert: true
+ message_alert: Page the SRE team to defend the SLO
+ message_ok: Last hour on track
```
-Using Stackdriver UI, let's set up an alert when our error budget burn rate is
-burning **9X faster** than it should in the last hour:
+Using Cloud Monitoring UI, let's set up an alert when our error budget burn rate
+is burning **9X faster** than it should in the last hour:
-* Open `Stackdriver Monitoring` and click on `Alerting > Create Policy`
+* Open `Cloud Monitoring` and click on `Alerting > Create Policy`
* Fill the alert name and click on `Add Condition`.
@@ -163,5 +162,5 @@ differentiate the alert messages.
## Examples
-Complete SLO samples using `Stackdriver` are available in
-[samples/stackdriver](../../samples/stackdriver). Check them out !
+Complete SLO samples using Cloud Monitoring are available in
+[samples/cloud_monitoring](../../samples/cloud_monitoring). Check them out !
diff --git a/docs/providers/stackdriver_service_monitoring.md b/docs/providers/cloud_service_monitoring.md
similarity index 51%
rename from docs/providers/stackdriver_service_monitoring.md
rename to docs/providers/cloud_service_monitoring.md
index b72d71c6..bea7d835 100644
--- a/docs/providers/stackdriver_service_monitoring.md
+++ b/docs/providers/cloud_service_monitoring.md
@@ -1,15 +1,21 @@
-# Stackdriver Service Monitoring
+# Cloud Service Monitoring
## Backend
-Using the `StackdriverServiceMonitoring` backend class, you can use the
-`Stackdriver Service Monitoring API` to manage your SLOs.
+Using the `cloud_service_monitoring` backend, you can use the
+`Cloud Service Monitoring API` to manage your SLOs.
-SLOs are created from standard metrics available in Stackdriver Monitoring and
-the data is stored in `Stackdriver Service Monitoring API` (see
+```yaml
+backends:
+ cloud_service_monitoring:
+ project_id: "${WORKSPACE_PROJECT_ID}"
+```
+
+SLOs are created from standard metrics available in Cloud Monitoring and
+the data is stored in `Cloud Service Monitoring API` (see
[docs](https://cloud.google.com/monitoring/service-monitoring/using-api)).
-The following methods are available to compute SLOs with the `Stackdriver`
+The following methods are available to compute SLOs with the `cloud_service_monitoring`
backend:
* `basic` to create standard SLOs for Google App Engine, Google Kubernetes
@@ -17,91 +23,84 @@ Engine, and Cloud Endpoints.
* `good_bad_ratio` for metrics of type `DELTA` or `CUMULATIVE`.
* `distribution_cut` for metrics of type `DELTA` and unit `DISTRIBUTION`.
+
### Basic
-The `basic` method is used to let the `Stackdriver Service Monitoring API`
+The `basic` method is used to let the `Cloud Service Monitoring API`
automatically generate standardized SLOs for the following GCP services:
* **Google App Engine**
* **Google Kubernetes Engine** (with Istio)
* **Google Cloud Endpoints**
-The SLO configuration uses Stackdriver
+The SLO configuration uses Cloud Monitoring
[GCP metrics](https://cloud.google.com/monitoring/api/metrics_gcp) and only
requires minimal configuration compared to custom SLOs.
**Example config (App Engine availability):**
```yaml
-backend:
- class: StackdriverServiceMonitoring
- method: basic
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- measurement:
- app_engine:
- project_id: ${GAE_PROJECT_ID}
- module_id: ${GAE_MODULE_ID}
- availability: {}
+backend: cloud_service_monitoring
+method: basic
+service_level_indicator:
+ app_engine:
+ project_id: ${GAE_PROJECT_ID}
+ module_id: ${GAE_MODULE_ID}
+ availability: {}
```
For details on filling the `app_engine` fields, see [AppEngine](https://cloud.google.com/monitoring/api/ref_v3/rest/v3/services#appengine)
spec.
-**→ [Full SLO config](../../samples/stackdriver_service_monitoring/slo_gae_app_availability_basic.yaml)**
+**→ [Full SLO config](../../samples/cloud_service_monitoring/slo_gae_app_availability_basic.yaml)**
**Example config (Cloud Endpoint latency):**
```yaml
-backend:
- class: StackdriverServiceMonitoring
- method: basic
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- measurement:
- cloud_endpoints:
- service: ${ENDPOINT_URL}
- latency:
- threshold: 724 # ms
+backend: cloud_service_monitoring
+method: basic
+service_level_indicator:
+ cloud_endpoints:
+ service_name: ${ENDPOINT_URL}
+ latency:
+ threshold: 724 # ms
```
For details on filling the `cloud_endpoints` fields, see [CloudEndpoint](https://cloud.google.com/monitoring/api/ref_v3/rest/v3/services#cloudendpoints)
spec.
-**Example config (Istio service latency) [NOT YET RELEASED]:**
+**Example config (Istio service latency):**
```yaml
-backend:
- class: StackdriverServiceMonitoring
- method: basic
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- measurement:
- mesh_istio:
- mesh_uid: ${GKE_MESH_UID}
- service_namespace: ${GKE_SERVICE_NAMESPACE}
- service_name: ${GKE_SERVICE_NAME}
- latency:
- threshold: 500 # ms
+backend: cloud_service_monitoring
+method: basic
+service_level_indicator:
+ mesh_istio:
+ mesh_uid: ${GKE_MESH_UID}
+ service_namespace: ${GKE_SERVICE_NAMESPACE}
+ service_name: ${GKE_SERVICE_NAME}
+ latency:
+ threshold: 500 # ms
```
For details on filling the `mesh_istio` fields, see [MeshIstio](https://cloud.google.com/monitoring/api/ref_v3/rest/v3/services#meshistio)
spec.
-**→ [Full SLO config](../../samples/stackdriver_service_monitoring/slo_gke_app_latency_basic.yaml)**
+**→ [Full SLO config](../../samples/cloud_service_monitoring/slo_gke_app_latency_basic.yaml)**
-**Example config (Istio service latency) [DEPRECATED SOON]:**
+**Example config (Istio service latency) [DEPRECATED]:**
```yaml
-backend:
- class: StackdriverServiceMonitoring
- method: basic
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- measurement:
- cluster_istio:
- project_id: ${GKE_PROJECT_ID}
- location: ${GKE_LOCATION}
- cluster_name: ${GKE_CLUSTER_NAME}
- service_namespace: ${GKE_SERVICE_NAMESPACE}
- service_name: ${GKE_SERVICE_NAME}
- latency:
- threshold: 500 # ms
+backend: cloud_service_monitoring
+method: basic
+service_level_indicator:
+ cluster_istio:
+ project_id: ${GKE_PROJECT_ID}
+ location: ${GKE_LOCATION}
+ cluster_name: ${GKE_CLUSTER_NAME}
+ service_namespace: ${GKE_SERVICE_NAMESPACE}
+ service_name: ${GKE_SERVICE_NAME}
+ latency:
+ threshold: 500 # ms
```
For details on filling the `cluster_istio` fields, see [ClusterIstio](https://cloud.google.com/monitoring/api/ref_v3/rest/v3/services#clusteristio)
spec.
-**→ [Full SLO config](../../samples/stackdriver_service_monitoring/slo_gke_app_latency_basic_deprecated.yaml)**
+**→ [Full SLO config](../../samples/cloud_service_monitoring/slo_gke_app_latency_basic_deprecated.yaml)**
### Good / bad ratio
@@ -118,30 +117,28 @@ purposes as well (see examples).
**Example config:**
```yaml
-backend:
- class: StackdriverServiceMonitoring
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- method: good_bad_ratio
- measurement:
- filter_good: >
- project="${GAE_PROJECT_ID}"
- metric.type="appengine.googleapis.com/http/server/response_count"
- resource.type="gae_app"
- metric.labels.response_code >= 200
- metric.labels.response_code < 500
- filter_valid: >
- project="${GAE_PROJECT_ID}"
- metric.type="appengine.googleapis.com/http/server/response_count"
+backend: cloud_service_monitoring
+method: good_bad_ratio
+service_level_indicator:
+ filter_good: >
+ project="${GAE_PROJECT_ID}"
+ metric.type="appengine.googleapis.com/http/server/response_count"
+ resource.type="gae_app"
+ metric.labels.response_code >= 200
+ metric.labels.response_code < 500
+ filter_valid: >
+ project="${GAE_PROJECT_ID}"
+ metric.type="appengine.googleapis.com/http/server/response_count"
```
You can also use the `filter_bad` field which identifies bad events instead of
the `filter_valid` field which identifies all valid events.
-**→ [Full SLO config](../../samples/stackdriver_service_monitoring/slo_gae_app_availability.yaml)**
+**→ [Full SLO config](../../samples/cloud_service_monitoring/slo_gae_app_availability.yaml)**
## Distribution cut
-The `distribution_cut` method is used for Stackdriver distribution-type metrics,
+The `distribution_cut` method is used for Cloud distribution-type metrics,
which are usually used for latency metrics.
A distribution metric records the **statistical distribution of the extracted
@@ -152,30 +149,28 @@ along with the `count`, `mean`, and `sum` of squared deviation of the values.
**Example config:**
```yaml
-backend:
- class: StackdriverServiceMonitoring
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- method: distribution_cut
- measurement:
- filter_valid: >
- project=${GAE_PROJECT_ID}
- metric.type=appengine.googleapis.com/http/server/response_latencies
- metric.labels.response_code >= 200
- metric.labels.response_code < 500
- range_min: 0
- range_max: 724 # ms
+backend: cloud_service_monitoring
+method: distribution_cut
+service_level_indicator:
+ filter_valid: >
+ project=${GAE_PROJECT_ID}
+ metric.type=appengine.googleapis.com/http/server/response_latencies
+ metric.labels.response_code >= 200
+ metric.labels.response_code < 500
+ range_min: 0
+ range_max: 724 # ms
```
The `range_min` and `range_max` are used to specify the latency range that we
consider 'good'.
-**→ [Full SLO config](../../samples/stackdriver_service_monitoring/slo_gae_app_latency.yaml)**
+**→ [Full SLO config](../../samples/cloud_service_monitoring/slo_gae_app_latency.yaml)**
## Service Monitoring API considerations
### Tracking objects
-Since `Stackdriver Service Monitoring API` persists `Service` and
+Since `Cloud Service Monitoring API` persists `Service` and
`ServiceLevelObjective` objects, we need ways to keep our local SLO YAML
configuration synced with the remote objects.
@@ -214,7 +209,7 @@ unique id to an auto-imported `Service`:
* **Cluster Istio [DEPRECATED SOON]:**
```
- ist:{project_id}-zone-{location}-{cluster_name}-{service_namespace}-{service_name}
+ ist:{project_id}-{suffix}-{location}-{cluster_name}-{service_namespace}-{service_name}
```
→ *Make sure that the `cluster_istio` block in your config has
the correct fields corresponding to your Istio service.*
@@ -225,19 +220,19 @@ random id.
**Custom**
Custom services are the ones you create yourself using the
-`Service Monitoring API` and the `slo-generator`.
+`Cloud Service Monitoring API` and the `slo-generator`.
The following conventions are used by the `slo-generator` to give a unique id
to a custom `Service` and `Service Level Objective` objects:
-* `service_id = ${service_name}-${feature_name}`
+* `service_id = ${metadata.service_name}-${metadata.feature_name}`
-* `slo_id = ${service_name}-${feature_name}-${slo_name}-${window}`
+* `slo_id = ${metadata.service_name}-${metadata.feature_name}-${metadata.slo_name}-${window}`
To keep track of those, **do not update any of the following fields** in your
configs:
- * `service_name`, `feature_name` and `slo_name` in the SLO config.
+ * `metadata.service_name`, `metadata.feature_name` and `metadata.slo_name` in the SLO config.
* `window` in the Error Budget Policy.
@@ -246,15 +241,12 @@ If you need to make updates to any of those fields, first run the
[#deleting-objects](#deleting-objects)), then re-run normally.
To import an existing custom `Service` objects, find out your service id from
-the API and fill the `service_id` in the SLO configuration.
-
-You cannot import an existing custom `ServiceLevelObjective` unless it complies
-to the naming convention.
+the API and fill the `service_id` in the `service_level_indicator` configuration.
### Deleting objects
-To delete an SLO object in `Stackdriver Monitoring API` using the
-`StackdriverServiceMonitoringBackend` class, run the `slo-generator` with the
+To delete an SLO object in `Cloud Monitoring API` using the
+`cloud_service_monitoring` class, run the `slo-generator` with the
`-d` (or `--delete`) flag:
```
@@ -263,10 +255,10 @@ slo-generator -f -b --delete
## Alerting
-See the Stackdriver Service Monitoring [docs](https://cloud.google.com/monitoring/service-monitoring/alerting-on-budget-burn-rate)
+See the Cloud Service Monitoring [docs](https://cloud.google.com/monitoring/service-monitoring/alerting-on-budget-burn-rate)
for instructions on alerting.
### Examples
-Complete SLO samples using `Stackdriver Service Monitoring` are available in [ samples/stackdriver_service_monitoring](../../samples/stackdriver_service_monitoring).
+Complete SLO samples using `Cloud Service Monitoring` are available in [ samples/cloud_service_monitoring](../../samples/cloud_service_monitoring).
Check them out !
diff --git a/docs/providers/custom.md b/docs/providers/custom.md
index 8da87640..bb180a58 100644
--- a/docs/providers/custom.md
+++ b/docs/providers/custom.md
@@ -41,14 +41,20 @@ class CustomBackend:
In order to call the `good_bad_ratio` method in the custom backend above, the
-`backend` block would look like this:
+`backends` block would look like this:
```yaml
-backend:
- class: custom.custom_backend.CustomBackend # relative Python path to the backend. Make sure __init__.py is created in subdirectories for this to work.
- method: good_bad_ratio # name of the method to run
- arg_1: test_arg_1 # passed to kwargs in __init__
- arg_2: test_arg_2 # passed to kwargs in __init__
+backends:
+ custom.custom_backend.CustomBackend: # relative Python path to the backend. Make sure __init__.py is created in subdirectories for this to work.
+ arg_1: test_arg_1 # passed to kwargs in __init__
+ arg_2: test_arg_2 # passed to kwargs in __init__
+```
+
+The `spec` section in the SLO config would look like:
+```yaml
+backend: custom.custom_backend.CustomBackend
+method: good_bad_ratio # name of the method to run
+service_level_indicator: {}
```
**→ [Full SLO config](../../samples/custom/slo_custom_app_availability_ratio.yaml)**
@@ -92,10 +98,17 @@ class CustomExporter:
and the corresponding `exporters` section in your SLO config:
+The `exporters` block in the shared config would look like this:
+
```yaml
exporters:
-- class: custom.custom_exporter.CustomExporter
- arg_1: test
+ custom.custom_exporter.CustomExporter: # relative Python path to the backend. Make sure __init__.py is created in subdirectories for this to work.
+ arg_1: test_arg_1 # passed to kwargs in __init__
+```
+
+The `spec` section in the SLO config would look like:
+```yaml
+exporters: [custom.custom_exporter.CustomExporter]
```
### Metrics
@@ -103,7 +116,9 @@ exporters:
A metrics exporter:
* must inherit from `slo_generator.exporters.base.MetricsExporter`.
-* must implement the `export_metric` method which exports **one** metric as a dict like:
+* must implement the `export_metric` method which exports **one** metric.
+The `export_metric` function takes a metric dict as input, such as:
+
```py
{
"name": ,
@@ -129,13 +144,13 @@ class CustomExporter(MetricsExporter): # derive from base class
"""Custom exporter."""
def export_metric(self, data):
- """Export data to Stackdriver Monitoring.
+ """Export data to Custom Monitoring API.
Args:
data (dict): Metric data.
Returns:
- object: Stackdriver Monitoring API result.
+ object: Custom Monitoring API result.
"""
# implement how to export 1 metric here...
return {
@@ -144,11 +159,12 @@ class CustomExporter(MetricsExporter): # derive from base class
}
```
-and the exporters section in your SLO config:
+The `exporters` block in the shared config would look like this:
+
```yaml
exporters:
- - class: custom.custom_exporter.CustomExporter
- arg_1: test
+ custom.custom_exporter.CustomExporter: # relative Python path to the backend. Make sure __init__.py is created in subdirectories for this to work.
+ arg_1: test_arg_1 # passed to kwargs in __init__
```
**Note:**
diff --git a/docs/providers/datadog.md b/docs/providers/datadog.md
index fb6b2128..f6e13f61 100644
--- a/docs/providers/datadog.md
+++ b/docs/providers/datadog.md
@@ -2,16 +2,28 @@
## Backend
-Using the `Datadog` backend class, you can query any metrics available in
+Using the `datadog` backend class, you can query any metrics available in
Datadog to create an SLO.
-The following methods are available to compute SLOs with the `Datadog`
+```yaml
+backends:
+ datadog:
+ api_key: ${DATADOG_API_KEY}
+ app_key: ${DATADOG_APP_KEY}
+```
+
+The following methods are available to compute SLOs with the `datadog`
backend:
* `good_bad_ratio` for computing good / bad metrics ratios.
* `query_sli` for computing SLIs directly with Datadog.
* `query_slo` for getting SLO value from Datadog SLO endpoint.
+Optional arguments to configure Datadog are documented in the Datadog
+`initialize` method [here](https://github.com/DataDog/datadogpy/blob/058114cc3d65483466684c96a5c23e36c3aa052e/datadog/__init__.py#L33).
+You can pass them in the `backend` section, such as specifying
+`api_host: api.datadoghq.eu` in order to use the EU site.
+
### Good / bad ratio
The `good_bad_ratio` method is used to compute the ratio between two metrics:
@@ -27,45 +39,29 @@ purposes as well (see examples).
**Config example:**
```yaml
-backend:
- class: Datadog
- method: good_bad_ratio
- api_key: ${DATADOG_API_KEY}
- app_key: ${DATADOG_APP_KEY}
- measurement:
- filter_good: app.requests.count{http.path:/, http.status_code_class:2xx}
- filter_valid: app.requests.count{http.path:/}
+backend: datadog
+method: good_bad_ratio
+service_level_indicator:
+ filter_good: app.requests.count{http.path:/, http.status_code_class:2xx}
+ filter_valid: app.requests.count{http.path:/}
```
**→ [Full SLO config](../../samples/datadog/slo_dd_app_availability_ratio.yaml)**
-Optional arguments to configure Datadog are documented in the Datadog
-`initialize` method [here](https://github.com/DataDog/datadogpy/blob/058114cc3d65483466684c96a5c23e36c3aa052e/datadog/__init__.py#L33).
-You can pass them in the `backend` section, such as specifying
-`api_host: api.datadoghq.eu` in order to use the EU site.
-
### Query SLI
The `query_sli` method is used to directly query the needed SLI with Datadog:
Datadog's query language is powerful enough that it can do ratios natively.
-This method makes it more flexible to input any `Datadog` SLI computation and
+This method makes it more flexible to input any `datadog` SLI computation and
eventually reduces the number of queries made to Datadog.
```yaml
-backend:
- class: Datadog
- method: query_sli
- api_key: ${DATADOG_API_KEY}
- app_key: ${DATADOG_APP_KEY}
- measurement:
- expression: sum:app.requests.count{http.path:/, http.status_code_class:2xx} / sum:app.requests.count{http.path:/}
+backend: datadog
+method: query_sli
+service_level_indicator:
+ expression: sum:app.requests.count{http.path:/, http.status_code_class:2xx} / sum:app.requests.count{http.path:/}
```
-Optional arguments to configure Datadog are documented in the Datadog
-`initialize` method [here](https://github.com/DataDog/datadogpy/blob/058114cc3d65483466684c96a5c23e36c3aa052e/datadog/__init__.py#L33).
-You can pass them in the `backend` section, such as specifying
-`api_host: api.datadoghq.eu` in order to use the EU site.
-
**→ [Full SLO config](../../samples/datadog/slo_dd_app_availability_query_sli.yaml)**
### Query SLO
@@ -73,46 +69,43 @@ You can pass them in the `backend` section, such as specifying
The `query_slo` method is used to directly query the needed SLO with Datadog:
indeed, Datadog has SLO objects that you can directly refer to in your config by inputing their `slo_id`.
-This method makes it more flexible to input any `Datadog` SLI computation and
+This method makes it more flexible to input any `datadog` SLI computation and
eventually reduces the number of queries made to Datadog.
To query the value from Datadog SLO, simply add a `slo_id` field in the
`measurement` section:
```yaml
-...
-backend:
- class: Datadog
- method: query_slo
- api_key: ${DATADOG_API_KEY}
- app_key: ${DATADOG_APP_KEY}
- measurement:
- slo_id: ${DATADOG_SLO_ID}
+backend: datadog
+method: query_slo
+service_level_indicator:
+ slo_id: ${DATADOG_SLO_ID}
```
**→ [Full SLO config](../../samples/datadog/slo_dd_app_availability_query_slo.yaml)**
### Examples
-Complete SLO samples using `Datadog` are available in
+Complete SLO samples using `datadog` are available in
[samples/datadog](../../samples/datadog). Check them out!
## Exporter
-The `Datadog` exporter allows to export SLO metrics to the Datadog API.
-
-**Example config:**
+The `datadog` exporter allows to export SLO metrics to the Datadog API.
```yaml
exporters:
- - class: Datadog
+ datadog:
api_key: ${DATADOG_API_KEY}
app_key: ${DATADOG_APP_KEY}
```
+Optional arguments to configure Datadog are documented in the Datadog
+`initialize` method [here](https://github.com/DataDog/datadogpy/blob/058114cc3d65483466684c96a5c23e36c3aa052e/datadog/__init__.py#L33).
+You can pass them in the `backend` section, such as specifying
+`api_host: api.datadoghq.eu` in order to use the EU site.
Optional fields:
- * `metrics`: List of metrics to export ([see docs](../shared/metrics.md)). Defaults to [`custom:error_budget_burn_rate`, `custom:sli_measurement`].
-
+ * `metrics`: [*optional*] `list` - List of metrics to export ([see docs](../shared/metrics.md)).
**→ [Full SLO config](../../samples/datadog/slo_dd_app_availability_ratio.yaml)**
diff --git a/docs/providers/dynatrace.md b/docs/providers/dynatrace.md
index c2fcbbac..90f1a3db 100644
--- a/docs/providers/dynatrace.md
+++ b/docs/providers/dynatrace.md
@@ -2,10 +2,17 @@
## Backend
-Using the `Dynatrace` backend class, you can query any metrics available in
+Using the `dynatrace` backend class, you can query any metrics available in
Dynatrace to create an SLO.
-The following methods are available to compute SLOs with the `Dynatrace`
+```yaml
+backends:
+ dynatrace:
+ api_token: ${DYNATRACE_API_TOKEN}
+ api_url: ${DYNATRACE_API_URL}
+```
+
+The following methods are available to compute SLOs with the `dynatrace`
backend:
* `good_bad_ratio` for computing good / bad metrics ratios.
@@ -25,16 +32,13 @@ purposes as well (see examples).
**Config example:**
```yaml
-backend:
- class: Dynatrace
- method: good_bad_ratio
- api_token: ${DYNATRACE_API_TOKEN}
- api_url: ${DYNATRACE_API_URL}
- measurement:
- query_good:
- metric_selector: ext:app.request_count:filter(and(eq(app,test_app),eq(env,prod),eq(status_code_class,2xx)))
- entity_selector: type(HOST)
- query_valid:
+backend: dynatrace
+method: good_bad_ratio
+service_level_indicator:
+ query_good:
+ metric_selector: ext:app.request_count:filter(and(eq(app,test_app),eq(env,prod),eq(status_code_class,2xx)))
+ entity_selector: type(HOST)
+ query_valid:
metric_selector: ext:app.request_count:filter(and(eq(app,test_app),eq(env,prod)))
entity_selector: type(HOST)
```
@@ -52,16 +56,13 @@ This method can be used for latency SLOs, by defining a latency threshold.
**Config example:**
```yaml
-backend:
- class: Dynatrace
- method: threshold
- api_token: ${DYNATRACE_API_TOKEN}
- api_url: ${DYNATRACE_API_URL}
- measurement:
- query_valid:
- metric_selector: ext:app.request_latency:filter(and(eq(app,test_app),eq(env,prod),eq(status_code_class,2xx)))
- entity_selector: type(HOST)
- threshold: 40000 # us
+backend: dynatrace
+method: threshold
+service_level_indicator:
+ query_valid:
+ metric_selector: ext:app.request_latency:filter(and(eq(app,test_app),eq(env,prod),eq(status_code_class,2xx)))
+ entity_selector: type(HOST)
+ threshold: 40000 # us
```
**→ [Full SLO config](../../samples/dynatrace/slo_dt_app_latency_threshold.yaml)**
@@ -71,24 +72,22 @@ Optional fields:
### Examples
-Complete SLO samples using `Dynatrace` are available in
+Complete SLO samples using `dynatrace` are available in
[samples/dynatrace](../../samples/dynatrace). Check them out!
## Exporter
-The `Dynatrace` exporter allows to export SLO metrics to Dynatrace API.
-
-**Example config:**
+The `dynatrace` exporter allows to export SLO metrics to Dynatrace API.
```yaml
exporters:
- - class: Dynatrace
- api_token: ${DYNATRACE_API_TOKEN}
- api_url: ${DYNATRACE_API_URL}
+ dynatrace:
+ api_token: ${DYNATRACE_API_TOKEN}
+ api_url: ${DYNATRACE_API_URL}
```
Optional fields:
- * `metrics`: List of metrics to export ([see docs](../shared/metrics.md)). Defaults to [`custom:error_budget_burn_rate`, `custom:sli_measurement`].
+ * `metrics`: List of metrics to export ([see docs](../shared/metrics.md)). Defaults to [`custom:error_budget_burn_rate`, `custom:sli_service_level_indicator`].
**→ [Full SLO config](../../samples/dynatrace/slo_dt_app_availability_ratio.yaml)**
diff --git a/docs/providers/elasticsearch.md b/docs/providers/elasticsearch.md
index 5e160bc3..c30b1dbb 100644
--- a/docs/providers/elasticsearch.md
+++ b/docs/providers/elasticsearch.md
@@ -2,10 +2,17 @@
## Backend
-Using the `Elasticsearch` backend class, you can query any metrics available in
+Using the `elasticsearch` backend class, you can query any metrics available in
Elasticsearch to create an SLO.
-The following methods are available to compute SLOs with the `Elasticsearch`
+```yaml
+backends:
+ elasticsearch:
+ api_token: ${DYNATRACE_API_TOKEN}
+ api_url: ${DYNATRACE_API_URL}
+```
+
+The following methods are available to compute SLOs with the `elasticsearch`
backend:
* `good_bad_ratio` for computing good / bad metrics ratios.
@@ -81,5 +88,5 @@ look like:
### Examples
-Complete SLO samples using the `Elasticsearch` backend are available in
+Complete SLO samples using the `elasticsearch` backend are available in
[samples/elasticsearch](../../samples/elasticsearch). Check them out !
diff --git a/docs/providers/prometheus.md b/docs/providers/prometheus.md
index 1ceb4a62..1120b777 100644
--- a/docs/providers/prometheus.md
+++ b/docs/providers/prometheus.md
@@ -2,10 +2,22 @@
## Backend
-Using the `Prometheus` backend class, you can query any metrics available in
+Using the `prometheus` backend class, you can query any metrics available in
Prometheus to create an SLO.
-The following methods are available to compute SLOs with the `Prometheus`
+```yaml
+backends:
+ prometheus:
+ url: http://localhost:9090
+ # headers:
+ # Content-Type: application/json
+ # Authorization: Basic b2s6cGFzcW==
+```
+
+Optional fields:
+* `headers` allows to specify Basic Authentication credentials if needed.
+
+The following methods are available to compute SLOs with the `prometheus`
backend:
* `good_bad_ratio` for computing good / bad metrics ratios.
@@ -26,24 +38,16 @@ purposes as well (see examples).
**Config example:**
```yaml
-backend:
- class: Prometheus
- method: good_bad_ratio
- url: http://localhost:9090
- # headers:
- # Content-Type: application/json
- # Authorization: Basic b2s6cGFzcW==
- measurement:
- filter_good: http_requests_total{handler="/metrics", code=~"2.."}[window]
- filter_valid: http_requests_total{handler="/metrics"}[window]
- # operators: ['sum', 'rate']
+backend: prometheus
+method: good_bad_ratio
+service_level_indicator:
+ filter_good: http_requests_total{handler="/metrics", code=~"2.."}[window]
+ filter_valid: http_requests_total{handler="/metrics"}[window]
+ # operators: ['sum', 'rate']
```
* The `window` placeholder is needed in the query and will be replaced by the
corresponding `window` field set in each step of the Error Budget Policy.
-* The `headers` section (commented) allows to specify Basic Authentication
-credentials if needed.
-
* The `operators` section defines which PromQL functions to apply on the
timeseries. The default is to compute `sum(increase([METRIC_NAME][window]))` to
get an accurate count of good and bad events. Be aware that changing will likely
@@ -64,26 +68,20 @@ eventually reduces the number of queries made to Prometheus.
See Bitnami's [article](https://engineering.bitnami.com/articles/implementing-slos-using-prometheus.html)
on engineering SLOs with Prometheus.
+**Config example:**
+
```yaml
-backend:
- class: Prometheus
- method: query_sli
- url: ${PROMETHEUS_URL}
- # headers:
- # Content-Type: application/json
- # Authorization: Basic b2s6cGFzcW==
- measurement:
- expression: >
- sum(rate(http_requests_total{handler="/metrics", code=~"2.."}[window]))
- /
- sum(rate(http_requests_total{handler="/metrics"}[window]))
+backend: prometheus
+method: query_sli
+service_level_indicator:
+ expression: >
+ sum(rate(http_requests_total{handler="/metrics", code=~"2.."}[window]))
+ /
+ sum(rate(http_requests_total{handler="/metrics"}[window]))
```
* The `window` placeholder is needed in the query and will be replaced by the
corresponding `window` field set in each step of the Error Budget Policy.
-* The `headers` section (commented) allows to specify Basic Authentication
-credentials if needed.
-
**→ [Full SLO config (availability)](../../samples/prometheus/slo_prom_metrics_availability_query_sli.yaml)**
**→ [Full SLO config (latency)](../../samples/prometheus/slo_prom_metrics_latency_query_sli.yaml)**
@@ -121,13 +119,11 @@ expressing it, as shown in the config example below.
**Config example:**
```yaml
-backend:
- class: Prometheus
- project_id: ${STACKDRIVER_HOST_PROJECT_ID}
- method: distribution_cut
- measurement:
- expression: http_requests_duration_bucket{path='/', code=~"2.."}
- threshold_bucket: 0.25 # corresponds to 'le' attribute in Prometheus histograms
+backend: prometheus
+method: distribution_cut
+service_level_indicator:
+ expression: http_requests_duration_bucket{path='/', code=~"2.."}
+ threshold_bucket: 0.25 # corresponds to 'le' attribute in Prometheus histograms
```
**→ [Full SLO config](../../samples/prometheus/slo_prom_metrics_latency_distribution_cut.yaml)**
@@ -137,31 +133,29 @@ set for your metric. Learn more in the [Prometheus docs](https://prometheus.io/d
## Exporter
-The `Prometheus` exporter allows to export SLO metrics to the
+The `prometheus` exporter allows to export SLO metrics to the
[Prometheus Pushgateway](https://prometheus.io/docs/practices/pushing/) which
needs to be running.
-`Prometheus` needs to be setup to **scrape metrics from `Pushgateway`** (see
- [documentation](https://github.com/prometheus/pushgateway) for more details).
-
-**Example config:**
-
```yaml
exporters:
- - class: Prometheus
- url: ${PUSHGATEWAY_URL}
+ prometheus:
+ url: ${PUSHGATEWAY_URL}
```
Optional fields:
- * `metrics`: List of metrics to export ([see docs](../shared/metrics.md)). Defaults to [`error_budget_burn_rate`, `sli_measurement`].
+ * `metrics`: List of metrics to export ([see docs](../shared/metrics.md)). Defaults to [`error_budget_burn_rate`, `sli_service_level_indicator`].
* `username`: Username for Basic Auth.
* `password`: Password for Basic Auth.
* `job`: Name of `Pushgateway` job. Defaults to `slo-generator`.
+***Note:*** `prometheus` needs to be setup to **scrape metrics from `Pushgateway`**
+(see [documentation](https://github.com/prometheus/pushgateway) for more details).
+
**→ [Full SLO config](../../samples/prometheus/slo_prom_metrics_availability_query_sli.yaml)**
### Examples
-Complete SLO samples using `Prometheus` are available in
+Complete SLO samples using `prometheus` are available in
[samples/prometheus](../../samples/prometheus). Check them out !
diff --git a/docs/providers/pubsub.md b/docs/providers/pubsub.md
index 197274f4..0f9f2437 100644
--- a/docs/providers/pubsub.md
+++ b/docs/providers/pubsub.md
@@ -2,18 +2,16 @@
## Exporter
-The `Pubsub` exporter will export SLO reports to a Pub/Sub topic, in JSON format.
-
-This allows teams to consume SLO reports in real-time, and take appropriate
-actions when they see a need.
-
-**Example config:**
+The `pubsub` exporter will export SLO reports to a Pub/Sub topic, in JSON format.
```yaml
exporters:
- - class: Pubsub
+ pubsub:
project_id: "${PUBSUB_PROJECT_ID}"
topic_name: "${PUBSUB_TOPIC_NAME}"
```
-**→ [Full SLO config](../../samples/stackdriver/slo_pubsub_subscription_throughput.yaml)**
+This allows teams to consume SLO reports in real-time, and take appropriate
+actions when they see a need.
+
+**→ [Full SLO config](../../samples/cloud_monitoring/slo_pubsub_subscription_throughput.yaml)**
diff --git a/docs/shared/metrics.md b/docs/shared/metrics.md
index a6cecf4d..b50c95ed 100644
--- a/docs/shared/metrics.md
+++ b/docs/shared/metrics.md
@@ -63,23 +63,23 @@ metrics:
```
where:
-* `name`: name of the [SLO Report](../../tests/unit/fixtures/slo_report.json)
+* `name`: name of the [SLO Report](../../tests/unit/fixtures/slo_report_v2.json)
field to export as a metric. The field MUST exist in the SLO report.
* `description`: description of the metric (if the metrics exporter supports it)
* `alias` (optional): rename the metric before writing to the monitoring
backend.
* `additional_labels` (optional) allow you to specify other labels to the
timeseries written. Each label name must correspond to a field of the
-[SLO Report](../../tests/unit/fixtures/slo_report.json).
+[SLO Report](../../tests/unit/fixtures/slo_report_v2.json).
## Metric exporters
Some metrics exporters have a specific `prefix` that is pre-prepended to the
metric name:
-* `StackdriverExporter` prefix: `custom.googleapis.com/`
-* `DatadogExporter` prefix: `custom:`
+* `cloud_monitoring` exporter prefix: `custom.googleapis.com/`
+* `datadog` prefix: `custom:`
Some metrics exporters have a limit of `labels` that can be written to their
metrics timeseries:
-* `StackdriverExporter` labels limit: `10`.
+* `cloud_monitoring` labels limit: `10`.
Those are standards and cannot be modified.
diff --git a/docs/shared/migration.md b/docs/shared/migration.md
new file mode 100644
index 00000000..65f9ef72
--- /dev/null
+++ b/docs/shared/migration.md
@@ -0,0 +1,33 @@
+# Migrating `slo-generator` to the next major version
+
+## v1 to v2
+
+Version `v2` of the slo-generator introduces some changes to the structure of
+the SLO configurations.
+
+To migrate your SLO configurations from v1 to v3, please execute the following
+instructions:
+
+**Upgrade `slo-generator`:**
+```
+pip3 install slo-generator -U # upgrades slo-generator version to the latest version
+```
+
+**Run the `slo-generator-migrate` command:**
+```
+slo-generator-migrate -s -t -b
+```
+where:
+* is the source folder containg SLO configurations in v1 format.
+This folder can have nested subfolders containing SLOs. The subfolder structure
+will be reproduced on the target folder.
+
+* is the target folder to drop the SLO configurations in v2
+format. If the target folder is identical to the source folder, the existing SLO
+configurations will be updated in-place.
+
+* is the path to your error budget policy configuration.
+
+**Follow the instructions printed to finish the migration:**
+This includes committing the resulting files to git and updating your Terraform
+modules to the version that supports the v2 configuration format.
diff --git a/docs/shared/troubleshooting.md b/docs/shared/troubleshooting.md
index 45fa562b..29d6eef5 100644
--- a/docs/shared/troubleshooting.md
+++ b/docs/shared/troubleshooting.md
@@ -2,7 +2,7 @@
## Problem
-**`StackdriverExporter`: Labels limit (10) reached.**
+**`cloud_monitoring` exporter: Labels limit (10) reached.**
```
The new labels would cause the metric custom.googleapis.com/slo_target to have over 10 labels.: timeSeries[0]"
diff --git a/samples/README.md b/samples/README.md
index e7198ba8..201181fe 100644
--- a/samples/README.md
+++ b/samples/README.md
@@ -14,17 +14,17 @@ running it.
The following is listing all environmental variables found in the SLO configs,
per backend:
-`stackdriver/`:
- - `STACKDRIVER_HOST_PROJECT_ID`: Stackdriver host project id.
- - `STACKDRIVER_LOG_METRIC_NAME`: Stackdriver log-based metric name.
+`cloud_monitoring/`:
+ - `WORKSPACE_PROJECT_ID`: Cloud Monitoring host project id.
+ - `LOG_METRIC_NAME`: Cloud Logging log-based metric name.
- `GAE_PROJECT_ID`: Google App Engine application project id.
- `GAE_MODULE_ID`: Google App Engine application module id.
- `PUBSUB_PROJECT_ID`: Pub/Sub project id.
- `PUBSUB_TOPIC_NAME`: Pub/Sub topic name.
-`stackdriver_service_monitoring/`:
- - `STACKDRIVER_HOST_PROJECT_ID`: Stackdriver host project id.
- - `STACKDRIVER_LOG_METRIC_NAME`: Stackdriver log-based metric name.
+`cloud_service_monitoring/`:
+ - `WORKSPACE_PROJECT_ID`: Cloud Monitoring host project id.
+ - `LOG_METRIC_NAME`: Cloud Logging log-based metric name.
- `GAE_PROJECT_ID`: Google App Engine application project id.
- `GAE_MODULE_ID`: Google App Engine application module id.
- `PUBSUB_PROJECT_ID`: Pub/Sub project id.
@@ -50,7 +50,7 @@ you're pointing to need to exist.
To run one sample:
```
-slo-generator -f samples/stackdriver/.yaml
+slo-generator -f samples/cloud_monitoring/.yaml
```
To run all the samples for a backend:
@@ -68,14 +68,14 @@ slo-generator -f samples/ -b samples/
### Examples
-##### Stackdriver
+##### Cloud Monitoring
```
-slo-generator -f samples/stackdriver -b error_budget_policy.yaml
+slo-generator -f samples/cloud_monitoring -b error_budget_policy.yaml
```
-##### Stackdriver Service Monitoring
+##### Cloud Service Monitoring
```
-slo-generator -f samples/stackdriver_service_monitoring -b error_budget_policy_ssm.yaml
+slo-generator -f samples/cloud_service_monitoring -b error_budget_policy_ssm.yaml
```
***Note:*** *the Error Budget Policy is different for this backend, because it only