Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update documentation for v2 #133

Merged
merged 6 commits into from
May 31, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
250 changes: 176 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,111 +5,213 @@
[![PyPI version](https://badge.fury.io/py/slo-generator.svg)](https://badge.fury.io/py/slo-generator)

`slo-generator` is a tool to compute and export **Service Level Objectives** ([SLOs](https://landing.google.com/sre/sre-book/chapters/service-level-objectives/)),
**Error Budgets** and **Burn Rates**, using policies written in JSON or YAML format.
**Error Budgets** and **Burn Rates**, using configurations written in YAML (or JSON) format.

## Table of contents
- [Description](#description)
- [Local usage](#local-usage)
- [Requirements](#requirements)
- [Installation](#installation)
- [CLI usage](#cli-usage)
- [API usage](#api-usage)
- [Configuration](#configuration)
- [SLO configuration](#slo-configuration)
- [Shared configuration](#shared-configuration)
- [More documentation](#more-documentation)
- [Build an SLO achievements report with BigQuery and DataStudio](#build-an-slo-achievements-report-with-bigquery-and-datastudio)
- [Deploy the SLO Generator in Cloud Run](#deploy-the-slo-generator-in-cloud-run)
- [Deploy the SLO Generator in Kubernetes (Alpha)](#deploy-the-slo-generator-in-kubernetes-alpha)
- [Deploy the SLO Generator in a CloudBuild pipeline](#deploy-the-slo-generator-in-a-cloudbuild-pipeline)
- [DEPRECATED: Deploy the SLO Generator on Google Cloud Functions (Terraform)](#deprecated-deploy-the-slo-generator-on-google-cloud-functions-terraform)
- [Contribute to the SLO Generator](#contribute-to-the-slo-generator)

## Description
`slo-generator` will query metrics backend and compute the following metrics:
The `slo-generator` runs backend queries computing **Service Level Indicators**,
compares them with the **Service Level Objectives** defined and generates a report
by computing important metrics:

* **Service Level Objective** defined as `SLO (%) = GOOD_EVENTS / VALID_EVENTS`
* **Error Budget** defined as `ERROR_BUDGET = 100 - SLO (%)`
* **Burn Rate** defined as `BURN_RATE = ERROR_BUDGET / ERROR_BUDGET_TARGET`
* **Service Level Indicator** (SLI) defined as **SLI = N<sub>good_events</sub> &#47; N<sub>valid_events</sub>**
* **Error Budget** (EB) defined as **EB = 1 - SLI**
* **Error Budget Burn Rate** (EBBR) defined as **EBBR = EB / EB<sub>target</sub>**
* **... and more**, see the [example SLO report](./test/unit/../../tests/unit/fixtures/slo_report_v2.json).

The **Error Budget Burn Rate** is often used for [**alerting on SLOs**](https://sre.google/workbook/alerting-on-slos/), as it demonstrates in practice to be more **reliable** and **stable** than
alerting directly on metrics or on **SLI > SLO** thresholds.

## Local usage

**Requirements**
### Requirements

* Python 3
* `python3.7` and above
* `pip3`

**Installation**
### Installation

`slo-generator` is published on PyPI. To install it, run:
`slo-generator` is a Python library published on [PyPI](https://pypi.org). To install it, run:

```sh
pip3 install slo-generator
```

**Run the `slo-generator`**

```
slo-generator -f <SLO_CONFIG_PATH> -b <ERROR_BUDGET_POLICY> --export
```
* `<SLO_CONFIG_PATH>` is the [SLO config](#slo-configuration) file or folder.
If a folder path is passed, the SLO configs filenames should match the pattern `slo_*.yaml` to be loaded.

* `<ERROR_BUDGET_POLICY>` is the [Error Budget Policy](#error-budget-policy) file.

* `--export` enables exporting data using the `exporters` defined in the SLO
configuration file.

Use `slo-generator --help` to list all available arguments.

***Notes:***
* To install **[providers](./docs/providers)**, use `pip3 install slo-generator[<PROVIDER_1>, <PROVIDER_2>, ... <PROVIDER_n]`. For instance:
* `pip3 install slo-generator[cloud_monitoring]` installs the Cloud Monitoring backend / exporter.
* `pip3 install slo-generator[prometheus, datadog, dynatrace]` install the Prometheus, Datadog and Dynatrace, backends / exporters.
* To install the **slo-generator API**, run `pip3 install slo-generator[api]`.
* To enable **debug logs**, set the environment variable `DEBUG` to `1`.
* To enable **colorized output** (local usage), set the environment variable `COLORED_OUTPUT` to `1`.

## Configuration

The `slo-generator` requires two configuration files to run, the **SLO configuration** file and the **Error budget policy** file.

#### SLO configuration
### CLI usage

The **SLO configuration** (JSON or YAML) is composed of the following fields:
To compute an SLO report using the CLI, run:
```
slo-generator compute -f <SLO_CONFIG_PATH> -c <SHARED_CONFIG_PATH> --export
```
where:
* `<SLO_CONFIG_PATH>` is the [SLO configuration](#slo-configuration) file or folder path.

* **SLO metadata**:
* `slo_name`: Name of this SLO.
* `slo_description`: Description of this SLO.
* `slo_target`: SLO target (between 0 and 1).
* `service_name`: Name of the monitored service.
* `feature_name`: Name of the monitored subsystem.
* `metadata`: Dict of user metadata.
* `<SHARED_CONFIG_PATH>` is the [Shared configuration](#shared-configuration) file path.

* `--export` | `-e` enables exporting data using the `exporters` specified in the SLO
configuration file.

* **SLI configuration**:
* `backend`: Specific documentation and examples are available for each supported backends:
* [Stackdriver Monitoring](docs/providers/stackdriver.md#backend)
* [Stackdriver Service Monitoring](docs/providers/stackdriver_service_monitoring.md#backend)
* [Prometheus](docs/providers/prometheus.md#backend)
* [ElasticSearch](docs/providers/elasticsearch.md#backend)
* [Datadog](docs/providers/datadog.md#backend)
* [Dynatrace](docs/providers/dynatrace.md#backend)
* [Custom](docs/providers/custom.md#backend)
Use `slo-generator compute --help` to list all available arguments.

- **Exporter configuration**:
* `exporters`: A list of exporters to export results to. Specific documentation is available for each supported exporters:
* [Cloud Pub/Sub](docs/providers/pubsub.md#exporter) to stream SLO reports.
* [BigQuery](docs/providers/bigquery.md#exporter) to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
* [Stackdriver Monitoring](docs/providers/stackdriver.md#exporter) to export metrics to Stackdriver Monitoring.
* [Prometheus](docs/providers/prometheus.md#exporter) to export metrics to Prometheus.
* [Datadog](docs/providers/datadog.md#exporter) to export metrics to Datadog.
* [Dynatrace](docs/providers/dynatrace.md#exporter) to export metrics to Dynatrace.
* [Custom](docs/providers/custom.md#exporter) to export SLO data or metrics to a custom destination.
### API usage

***Note:*** *you can use environment variables in your SLO configs by using `${MY_ENV_VAR}` syntax to avoid having sensitive data in version control. Environment variables will be replaced at run time.*
On top of the CLI, the `slo-generator` can also be run as an API using the Cloud
Functions Framework SDK (Flask):
```
slo-generator api -c <SHARED_CONFIG_PATH>
```
where:
* `<SHARED_CONFIG_PATH>` is the [Shared configuration](#shared-configuration) file path or GCS URL.

==> An example SLO configuration file is available [here](samples/stackdriver/slo_gae_app_availability.yaml).
Once the API is up-and-running, you can `HTTP POST` SLO configurations to it.

#### Error Budget policy
***Notes:***
* The API responds by default to HTTP requests. An alternative mode is to
respond to [`CloudEvents`](https://cloudevents.io/) instead, by setting
`--signature-type cloudevent`.

The **Error Budget policy** (JSON or YAML) is a list of multiple error budgets, each one composed of the following fields:
* Use `--target export` to run the API in export mode only (former `slo-pipeline`).

* `window`: Rolling time window for this error budget.
* `alerting_burn_rate_threshold`: Target burnrate threshold over which alerting is needed.
* `urgent_notification`: boolean whether violating this error budget should trigger a page.
* `overburned_consequence_message`: message to show when the error budget is above the target.
* `achieved_consequence_message`: message to show when the error budget is within the target.
## Configuration

==> An example Error Budget policy is available [here](samples/error_budget_policy.yaml).
The `slo-generator` requires two configuration files to run, an **SLO configuration**
file, describing your SLO, and the **Shared configuration** file (common
configuration for all SLOs).

### SLO configuration

The **SLO configuration** (JSON or YAML) is following the Kubernetes format and
is composed of the following fields:

* `api`: `sre.google.com/v2`
* `kind`: `ServiceLevelObjective`
* `metadata`:
* `name`: [**required**] *string* - Full SLO name (**MUST** be unique).
* `labels`: [*optional*] *map* - Metadata labels, **for example**:
* `slo_name`: SLO name (e.g `availability`, `latency128ms`, ...).
* `service_name`: Monitored service (to group SLOs by service).
* `feature_name`: Monitored feature (to group SLOs by feature).

* `spec`:
* `description`: [**required**] *string* - Description of this SLO.
* `goal`: [**required**] *string* - SLO goal (or target) (**MUST** be between 0 and 1).
* `backend`: [**required**] *string* - Backend name (**MUST** exist in SLO Generator Configuration).
* `service_level_indicator`: [**required**] *map* - SLI configuration. The content of this section is
specific to each provider, see [`docs/providers`](./docs/providers).
* `error_budget_policy`: [*optional*] *string* - Error budget policy name
(**MUST** exist in SLO Generator Configuration). If not specified, defaults to `default`.
* `exporters`: [*optional*] *string* - List of exporter names (**MUST** exist in SLO Generator Configuration).

***Note:*** *you can use environment variables in your SLO configs by using
`${MY_ENV_VAR}` syntax to avoid having sensitive data in version control.
Environment variables will be replaced automatically at run time.*

**&rarr; See [example SLO configuration](samples/cloud_monitoring/slo_gae_app_availability.yaml).**

### Shared configuration
The shared configuration (JSON or YAML) configures the `slo-generator` and acts
as a shared config for all SLO configs. It is composed of the following fields:

* `backends`: [**required**] *map* - Data backends configurations. Each backend
alias is defined as a key `<backend_name>/<suffix>`, and a configuration map.
```yaml
backends:
cloud_monitoring/dev:
project_id: proj-cm-dev-a4b7
datadog/test:
app_key: ${APP_SECRET_KEY}
api_key: ${API_SECRET_KEY}
```
See specific providers documentation for detailed configuration:
* [`cloud_monitoring`](docs/providers/cloud_monitoring.md#backend)
* [`cloud_service_monitoring`](docs/providers/cloud_service_monitoring.md#backend)
* [`prometheus`](docs/providers/prometheus.md#backend)
* [`elasticsearch`](docs/providers/elasticsearch.md#backend)
* [`datadog`](docs/providers/datadog.md#backend)
* [`dynatrace`](docs/providers/dynatrace.md#backend)
* [`<custom>`](docs/providers/custom.md#backend)

* `exporters`: A map of exporters to export results to. Each exporter is defined
as a key formatted as `<exporter_name>/<suffix>`, and a map value detailing the
exporter configuration.
```yaml
exporters:
bigquery/dev:
project_id: proj-bq-dev-a4b7
dataset_id: my-test-dataset
table_id: my-test-table
prometheus/test:
url: ${PROMETHEUS_URL}
```
See specific providers documentation for detailed configuration:
* [`pubsub`](docs/providers/pubsub.md#exporter) to stream SLO reports.
* [`bigquery`](docs/providers/bigquery.md#exporter) to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
* [`cloud_monitoring`](docs/providers/cloud_monitoring.md#exporter) to export metrics to Cloud Monitoring.
* [`prometheus`](docs/providers/prometheus.md#exporter) to export metrics to Prometheus.
* [`datadog`](docs/providers/datadog.md#exporter) to export metrics to Datadog.
* [`dynatrace`](docs/providers/dynatrace.md#exporter) to export metrics to Dynatrace.
* [`<custom>`](docs/providers/custom.md#exporter) to export SLO data or metrics to a custom destination.

* `error_budget_policies`: [**required**] A map of various error budget policies.
* `<NAME>`: Name of the error budget policy.
* `steps`: List of error budget policy steps, each containing the following fields:
* `window`: Rolling time window for this error budget.
* `alerting_burn_rate_threshold`: Target burnrate threshold over which alerting is needed.
* `urgent_notification`: boolean whether violating this error budget should trigger a page.
* `overburned_consequence_message`: message to show when the error budget is above the target.
* `achieved_consequence_message`: message to show when the error budget is within the target.

```yaml
error_budget_policies:
default:
steps:
- name: 1 hour
burn_rate_threshold: 9
alert: true
message_alert: Page to defend the SLO
message_ok: Last hour on track
window: 3600
- name: 12 hours
burn_rate_threshold: 3
alert: true
message_alert: Page to defend the SLO
message_ok: Last 12 hours on track
window: 43200
```

**&rarr; See [example Shared configuration](samples/config.yaml).**

## More documentation

To go further with the SLO Generator, you can read:

* [Build an SLO achievements report with BigQuery and DataStudio](docs/deploy/datastudio_slo_report.md)

* [Deploy the SLO Generator on Google Cloud Functions (Terraform)](docs/deploy/cloudfunctions.md)

* [Deploy the SLO Generator on Kubernetes (Alpha)](docs/deploy/kubernetes.md)

* [Deploy the SLO Generator in a CloudBuild pipeline](docs/deploy/cloudbuild.md)

* [Contribute to the SLO Generator](CONTRIBUTING.md)
### [Build an SLO achievements report with BigQuery and DataStudio](docs/deploy/datastudio_slo_report.md)
### [Deploy the SLO Generator in Cloud Run](docs/deploy/cloudrun.md)
### [Deploy the SLO Generator in Kubernetes (Alpha)](docs/deploy/kubernetes.md)
### [Deploy the SLO Generator in a CloudBuild pipeline](docs/deploy/cloudbuild.md)
### [DEPRECATED: Deploy the SLO Generator on Google Cloud Functions (Terraform)](docs/deploy/cloudfunctions.md)
### [Contribute to the SLO Generator](CONTRIBUTING.md)
8 changes: 4 additions & 4 deletions docs/deploy/cloudfunctions.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@


Other components can be added to make results available to other destinations:
* A **Cloud Function** to export SLO reports (e.g: to BigQuery and Stackdriver Monitoring), running `slo-generator`.
* A **Stackdriver Monitoring Policy** to alert on high budget Burn Rates.
* A **Cloud Function** to export SLO reports (e.g: to BigQuery and Cloud Monitoring), running `slo-generator`.
* A **Cloud Monitoring Policy** to alert on high budget Burn Rates.

Below is a diagram of what this pipeline looks like:

Expand All @@ -22,9 +22,9 @@ Below is a diagram of what this pipeline looks like:

* **Historical analytics** by analyzing SLO data in Bigquery.

* **Real-time alerting** by setting up Stackdriver Monitoring alerts based on
* **Real-time alerting** by setting up Cloud Monitoring alerts based on
wanted SLOs.

* **Real-time, daily, monthly, yearly dashboards** by streaming BigQuery SLO reports to DataStudio (see [here](datastudio_slo_report.md)) and building dashboards.

An example of pipeline automation with Terraform can be found in the corresponding [Terraform module](https://github.com/terraform-google-modules/terraform-google-slo/tree/master/examples/simple_example).
An example of pipeline automation with Terraform can be found in the corresponding [Terraform module](https://github.com/terraform-google-modules/terraform-google-slo/tree/master/examples/slo-generator/simple_example).
Loading