Skip to content

Commit

Permalink
docs: Update documentation for v2 (#133)
Browse files Browse the repository at this point in the history
  • Loading branch information
Olivier Cervello authored May 31, 2021
1 parent 5b2635b commit 0a9cd38
Show file tree
Hide file tree
Showing 14 changed files with 533 additions and 400 deletions.
250 changes: 176 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,111 +5,213 @@
[![PyPI version](https://badge.fury.io/py/slo-generator.svg)](https://badge.fury.io/py/slo-generator)

`slo-generator` is a tool to compute and export **Service Level Objectives** ([SLOs](https://landing.google.com/sre/sre-book/chapters/service-level-objectives/)),
**Error Budgets** and **Burn Rates**, using policies written in JSON or YAML format.
**Error Budgets** and **Burn Rates**, using configurations written in YAML (or JSON) format.

## Table of contents
- [Description](#description)
- [Local usage](#local-usage)
- [Requirements](#requirements)
- [Installation](#installation)
- [CLI usage](#cli-usage)
- [API usage](#api-usage)
- [Configuration](#configuration)
- [SLO configuration](#slo-configuration)
- [Shared configuration](#shared-configuration)
- [More documentation](#more-documentation)
- [Build an SLO achievements report with BigQuery and DataStudio](#build-an-slo-achievements-report-with-bigquery-and-datastudio)
- [Deploy the SLO Generator in Cloud Run](#deploy-the-slo-generator-in-cloud-run)
- [Deploy the SLO Generator in Kubernetes (Alpha)](#deploy-the-slo-generator-in-kubernetes-alpha)
- [Deploy the SLO Generator in a CloudBuild pipeline](#deploy-the-slo-generator-in-a-cloudbuild-pipeline)
- [DEPRECATED: Deploy the SLO Generator on Google Cloud Functions (Terraform)](#deprecated-deploy-the-slo-generator-on-google-cloud-functions-terraform)
- [Contribute to the SLO Generator](#contribute-to-the-slo-generator)

## Description
`slo-generator` will query metrics backend and compute the following metrics:
The `slo-generator` runs backend queries computing **Service Level Indicators**,
compares them with the **Service Level Objectives** defined and generates a report
by computing important metrics:

* **Service Level Objective** defined as `SLO (%) = GOOD_EVENTS / VALID_EVENTS`
* **Error Budget** defined as `ERROR_BUDGET = 100 - SLO (%)`
* **Burn Rate** defined as `BURN_RATE = ERROR_BUDGET / ERROR_BUDGET_TARGET`
* **Service Level Indicator** (SLI) defined as **SLI = N<sub>good_events</sub> &#47; N<sub>valid_events</sub>**
* **Error Budget** (EB) defined as **EB = 1 - SLI**
* **Error Budget Burn Rate** (EBBR) defined as **EBBR = EB / EB<sub>target</sub>**
* **... and more**, see the [example SLO report](./test/unit/../../tests/unit/fixtures/slo_report_v2.json).

The **Error Budget Burn Rate** is often used for [**alerting on SLOs**](https://sre.google/workbook/alerting-on-slos/), as it demonstrates in practice to be more **reliable** and **stable** than
alerting directly on metrics or on **SLI > SLO** thresholds.

## Local usage

**Requirements**
### Requirements

* Python 3
* `python3.7` and above
* `pip3`

**Installation**
### Installation

`slo-generator` is published on PyPI. To install it, run:
`slo-generator` is a Python library published on [PyPI](https://pypi.org). To install it, run:

```sh
pip3 install slo-generator
```

**Run the `slo-generator`**

```
slo-generator -f <SLO_CONFIG_PATH> -b <ERROR_BUDGET_POLICY> --export
```
* `<SLO_CONFIG_PATH>` is the [SLO config](#slo-configuration) file or folder.
If a folder path is passed, the SLO configs filenames should match the pattern `slo_*.yaml` to be loaded.

* `<ERROR_BUDGET_POLICY>` is the [Error Budget Policy](#error-budget-policy) file.

* `--export` enables exporting data using the `exporters` defined in the SLO
configuration file.

Use `slo-generator --help` to list all available arguments.

***Notes:***
* To install **[providers](./docs/providers)**, use `pip3 install slo-generator[<PROVIDER_1>, <PROVIDER_2>, ... <PROVIDER_n]`. For instance:
* `pip3 install slo-generator[cloud_monitoring]` installs the Cloud Monitoring backend / exporter.
* `pip3 install slo-generator[prometheus, datadog, dynatrace]` install the Prometheus, Datadog and Dynatrace, backends / exporters.
* To install the **slo-generator API**, run `pip3 install slo-generator[api]`.
* To enable **debug logs**, set the environment variable `DEBUG` to `1`.
* To enable **colorized output** (local usage), set the environment variable `COLORED_OUTPUT` to `1`.

## Configuration

The `slo-generator` requires two configuration files to run, the **SLO configuration** file and the **Error budget policy** file.

#### SLO configuration
### CLI usage

The **SLO configuration** (JSON or YAML) is composed of the following fields:
To compute an SLO report using the CLI, run:
```
slo-generator compute -f <SLO_CONFIG_PATH> -c <SHARED_CONFIG_PATH> --export
```
where:
* `<SLO_CONFIG_PATH>` is the [SLO configuration](#slo-configuration) file or folder path.

* **SLO metadata**:
* `slo_name`: Name of this SLO.
* `slo_description`: Description of this SLO.
* `slo_target`: SLO target (between 0 and 1).
* `service_name`: Name of the monitored service.
* `feature_name`: Name of the monitored subsystem.
* `metadata`: Dict of user metadata.
* `<SHARED_CONFIG_PATH>` is the [Shared configuration](#shared-configuration) file path.

* `--export` | `-e` enables exporting data using the `exporters` specified in the SLO
configuration file.

* **SLI configuration**:
* `backend`: Specific documentation and examples are available for each supported backends:
* [Stackdriver Monitoring](docs/providers/stackdriver.md#backend)
* [Stackdriver Service Monitoring](docs/providers/stackdriver_service_monitoring.md#backend)
* [Prometheus](docs/providers/prometheus.md#backend)
* [ElasticSearch](docs/providers/elasticsearch.md#backend)
* [Datadog](docs/providers/datadog.md#backend)
* [Dynatrace](docs/providers/dynatrace.md#backend)
* [Custom](docs/providers/custom.md#backend)
Use `slo-generator compute --help` to list all available arguments.

- **Exporter configuration**:
* `exporters`: A list of exporters to export results to. Specific documentation is available for each supported exporters:
* [Cloud Pub/Sub](docs/providers/pubsub.md#exporter) to stream SLO reports.
* [BigQuery](docs/providers/bigquery.md#exporter) to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
* [Stackdriver Monitoring](docs/providers/stackdriver.md#exporter) to export metrics to Stackdriver Monitoring.
* [Prometheus](docs/providers/prometheus.md#exporter) to export metrics to Prometheus.
* [Datadog](docs/providers/datadog.md#exporter) to export metrics to Datadog.
* [Dynatrace](docs/providers/dynatrace.md#exporter) to export metrics to Dynatrace.
* [Custom](docs/providers/custom.md#exporter) to export SLO data or metrics to a custom destination.
### API usage

***Note:*** *you can use environment variables in your SLO configs by using `${MY_ENV_VAR}` syntax to avoid having sensitive data in version control. Environment variables will be replaced at run time.*
On top of the CLI, the `slo-generator` can also be run as an API using the Cloud
Functions Framework SDK (Flask):
```
slo-generator api -c <SHARED_CONFIG_PATH>
```
where:
* `<SHARED_CONFIG_PATH>` is the [Shared configuration](#shared-configuration) file path or GCS URL.

==> An example SLO configuration file is available [here](samples/stackdriver/slo_gae_app_availability.yaml).
Once the API is up-and-running, you can `HTTP POST` SLO configurations to it.

#### Error Budget policy
***Notes:***
* The API responds by default to HTTP requests. An alternative mode is to
respond to [`CloudEvents`](https://cloudevents.io/) instead, by setting
`--signature-type cloudevent`.

The **Error Budget policy** (JSON or YAML) is a list of multiple error budgets, each one composed of the following fields:
* Use `--target export` to run the API in export mode only (former `slo-pipeline`).

* `window`: Rolling time window for this error budget.
* `alerting_burn_rate_threshold`: Target burnrate threshold over which alerting is needed.
* `urgent_notification`: boolean whether violating this error budget should trigger a page.
* `overburned_consequence_message`: message to show when the error budget is above the target.
* `achieved_consequence_message`: message to show when the error budget is within the target.
## Configuration

==> An example Error Budget policy is available [here](samples/error_budget_policy.yaml).
The `slo-generator` requires two configuration files to run, an **SLO configuration**
file, describing your SLO, and the **Shared configuration** file (common
configuration for all SLOs).

### SLO configuration

The **SLO configuration** (JSON or YAML) is following the Kubernetes format and
is composed of the following fields:

* `api`: `sre.google.com/v2`
* `kind`: `ServiceLevelObjective`
* `metadata`:
* `name`: [**required**] *string* - Full SLO name (**MUST** be unique).
* `labels`: [*optional*] *map* - Metadata labels, **for example**:
* `slo_name`: SLO name (e.g `availability`, `latency128ms`, ...).
* `service_name`: Monitored service (to group SLOs by service).
* `feature_name`: Monitored feature (to group SLOs by feature).

* `spec`:
* `description`: [**required**] *string* - Description of this SLO.
* `goal`: [**required**] *string* - SLO goal (or target) (**MUST** be between 0 and 1).
* `backend`: [**required**] *string* - Backend name (**MUST** exist in SLO Generator Configuration).
* `service_level_indicator`: [**required**] *map* - SLI configuration. The content of this section is
specific to each provider, see [`docs/providers`](./docs/providers).
* `error_budget_policy`: [*optional*] *string* - Error budget policy name
(**MUST** exist in SLO Generator Configuration). If not specified, defaults to `default`.
* `exporters`: [*optional*] *string* - List of exporter names (**MUST** exist in SLO Generator Configuration).

***Note:*** *you can use environment variables in your SLO configs by using
`${MY_ENV_VAR}` syntax to avoid having sensitive data in version control.
Environment variables will be replaced automatically at run time.*

**&rarr; See [example SLO configuration](samples/cloud_monitoring/slo_gae_app_availability.yaml).**

### Shared configuration
The shared configuration (JSON or YAML) configures the `slo-generator` and acts
as a shared config for all SLO configs. It is composed of the following fields:

* `backends`: [**required**] *map* - Data backends configurations. Each backend
alias is defined as a key `<backend_name>/<suffix>`, and a configuration map.
```yaml
backends:
cloud_monitoring/dev:
project_id: proj-cm-dev-a4b7
datadog/test:
app_key: ${APP_SECRET_KEY}
api_key: ${API_SECRET_KEY}
```
See specific providers documentation for detailed configuration:
* [`cloud_monitoring`](docs/providers/cloud_monitoring.md#backend)
* [`cloud_service_monitoring`](docs/providers/cloud_service_monitoring.md#backend)
* [`prometheus`](docs/providers/prometheus.md#backend)
* [`elasticsearch`](docs/providers/elasticsearch.md#backend)
* [`datadog`](docs/providers/datadog.md#backend)
* [`dynatrace`](docs/providers/dynatrace.md#backend)
* [`<custom>`](docs/providers/custom.md#backend)

* `exporters`: A map of exporters to export results to. Each exporter is defined
as a key formatted as `<exporter_name>/<suffix>`, and a map value detailing the
exporter configuration.
```yaml
exporters:
bigquery/dev:
project_id: proj-bq-dev-a4b7
dataset_id: my-test-dataset
table_id: my-test-table
prometheus/test:
url: ${PROMETHEUS_URL}
```
See specific providers documentation for detailed configuration:
* [`pubsub`](docs/providers/pubsub.md#exporter) to stream SLO reports.
* [`bigquery`](docs/providers/bigquery.md#exporter) to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
* [`cloud_monitoring`](docs/providers/cloud_monitoring.md#exporter) to export metrics to Cloud Monitoring.
* [`prometheus`](docs/providers/prometheus.md#exporter) to export metrics to Prometheus.
* [`datadog`](docs/providers/datadog.md#exporter) to export metrics to Datadog.
* [`dynatrace`](docs/providers/dynatrace.md#exporter) to export metrics to Dynatrace.
* [`<custom>`](docs/providers/custom.md#exporter) to export SLO data or metrics to a custom destination.

* `error_budget_policies`: [**required**] A map of various error budget policies.
* `<NAME>`: Name of the error budget policy.
* `steps`: List of error budget policy steps, each containing the following fields:
* `window`: Rolling time window for this error budget.
* `alerting_burn_rate_threshold`: Target burnrate threshold over which alerting is needed.
* `urgent_notification`: boolean whether violating this error budget should trigger a page.
* `overburned_consequence_message`: message to show when the error budget is above the target.
* `achieved_consequence_message`: message to show when the error budget is within the target.

```yaml
error_budget_policies:
default:
steps:
- name: 1 hour
burn_rate_threshold: 9
alert: true
message_alert: Page to defend the SLO
message_ok: Last hour on track
window: 3600
- name: 12 hours
burn_rate_threshold: 3
alert: true
message_alert: Page to defend the SLO
message_ok: Last 12 hours on track
window: 43200
```

**&rarr; See [example Shared configuration](samples/config.yaml).**

## More documentation

To go further with the SLO Generator, you can read:

* [Build an SLO achievements report with BigQuery and DataStudio](docs/deploy/datastudio_slo_report.md)

* [Deploy the SLO Generator on Google Cloud Functions (Terraform)](docs/deploy/cloudfunctions.md)

* [Deploy the SLO Generator on Kubernetes (Alpha)](docs/deploy/kubernetes.md)

* [Deploy the SLO Generator in a CloudBuild pipeline](docs/deploy/cloudbuild.md)

* [Contribute to the SLO Generator](CONTRIBUTING.md)
### [Build an SLO achievements report with BigQuery and DataStudio](docs/deploy/datastudio_slo_report.md)
### [Deploy the SLO Generator in Cloud Run](docs/deploy/cloudrun.md)
### [Deploy the SLO Generator in Kubernetes (Alpha)](docs/deploy/kubernetes.md)
### [Deploy the SLO Generator in a CloudBuild pipeline](docs/deploy/cloudbuild.md)
### [DEPRECATED: Deploy the SLO Generator on Google Cloud Functions (Terraform)](docs/deploy/cloudfunctions.md)
### [Contribute to the SLO Generator](CONTRIBUTING.md)
8 changes: 4 additions & 4 deletions docs/deploy/cloudfunctions.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@


Other components can be added to make results available to other destinations:
* A **Cloud Function** to export SLO reports (e.g: to BigQuery and Stackdriver Monitoring), running `slo-generator`.
* A **Stackdriver Monitoring Policy** to alert on high budget Burn Rates.
* A **Cloud Function** to export SLO reports (e.g: to BigQuery and Cloud Monitoring), running `slo-generator`.
* A **Cloud Monitoring Policy** to alert on high budget Burn Rates.

Below is a diagram of what this pipeline looks like:

Expand All @@ -22,9 +22,9 @@ Below is a diagram of what this pipeline looks like:

* **Historical analytics** by analyzing SLO data in Bigquery.

* **Real-time alerting** by setting up Stackdriver Monitoring alerts based on
* **Real-time alerting** by setting up Cloud Monitoring alerts based on
wanted SLOs.

* **Real-time, daily, monthly, yearly dashboards** by streaming BigQuery SLO reports to DataStudio (see [here](datastudio_slo_report.md)) and building dashboards.

An example of pipeline automation with Terraform can be found in the corresponding [Terraform module](https://github.com/terraform-google-modules/terraform-google-slo/tree/master/examples/simple_example).
An example of pipeline automation with Terraform can be found in the corresponding [Terraform module](https://github.com/terraform-google-modules/terraform-google-slo/tree/master/examples/slo-generator/simple_example).
Loading

0 comments on commit 0a9cd38

Please sign in to comment.