Skip to content

Latest commit

 

History

History
226 lines (184 loc) · 10.8 KB

README.md

File metadata and controls

226 lines (184 loc) · 10.8 KB

SLO Generator

test build deploy PyPI version Downloads

slo-generator is a tool to compute and export Service Level Objectives (SLOs), Error Budgets and Burn Rates, using configurations written in YAML (or JSON) format.

IMPORTANT NOTE: the following content is the slo-generator v2 documentation. The v1 documentation is available here, and instructions to migrate to v2 are available here.

Table of contents

Description

The slo-generator runs backend queries computing Service Level Indicators, compares them with the Service Level Objectives defined and generates a report by computing important metrics:

  • Service Level Indicator (SLI) defined as SLI = Ngood_events / Nvalid_events
  • Error Budget (EB) defined as EB = 1 - SLI
  • Error Budget Burn Rate (EBBR) defined as EBBR = EB / EBtarget
  • ... and more, see the example SLO report.

The Error Budget Burn Rate is often used for alerting on SLOs, as it demonstrates in practice to be more reliable and stable than alerting directly on metrics or on SLI > SLO thresholds.

Local usage

Requirements

  • python3.7 and above
  • pip3

Installation

slo-generator is a Python library published on PyPI. To install it, run:

pip3 install slo-generator

Notes:

  • To install providers, use pip3 install slo-generator[<PROVIDER_1>, <PROVIDER_2>, ... <PROVIDER_n]. For instance:
    • pip3 install slo-generator[cloud_monitoring] installs the Cloud Monitoring backend / exporter.
    • pip3 install slo-generator[prometheus, datadog, dynatrace] install the Prometheus, Datadog and Dynatrace, backends / exporters.
  • To install the slo-generator API, run pip3 install slo-generator[api].
  • To enable debug logs, set the environment variable DEBUG to 1.
  • To enable colorized output (local usage), set the environment variable COLORED_OUTPUT to 1.

CLI usage

To compute an SLO report using the CLI, run:

slo-generator compute -f <SLO_CONFIG_PATH> -c <SHARED_CONFIG_PATH> --export

where:

  • <SLO_CONFIG_PATH> is the SLO configuration file or folder path.

  • <SHARED_CONFIG_PATH> is the Shared configuration file path.

  • --export | -e enables exporting data using the exporters specified in the SLO configuration file.

Use slo-generator compute --help to list all available arguments.

API usage

On top of the CLI, the slo-generator can also be run as an API using the Cloud Functions Framework SDK (Flask):

slo-generator api -c <SHARED_CONFIG_PATH>

where:

Once the API is up-and-running, you can HTTP POST SLO configurations (YAML or JSON) to it:

curl -X POST -H "Content-Type: text/x-yaml" --data-binary @slo.yaml localhost:8080 # yaml SLO config
curl -X POST -H "Content-Type: application/json" -d @slo.json localhost:8080 # json SLO config

Notes:

  • The API responds by default to HTTP requests. An alternative mode is to respond to CloudEvents instead, by setting --signature-type cloudevent.

  • Use --target export to run the API in export mode only (former slo-pipeline).

Configuration

The slo-generator requires two configuration files to run, an SLO configuration file, describing your SLO, and the Shared configuration file (common configuration for all SLOs).

SLO configuration

The SLO configuration (JSON or YAML) is following the Kubernetes format and is composed of the following fields:

  • api: sre.google.com/v2

  • kind: ServiceLevelObjective

  • metadata:

    • name: [required] string - Full SLO name (MUST be unique).
    • labels: [optional] map - Metadata labels, for example:
      • slo_name: SLO name (e.g availability, latency128ms, ...).
      • service_name: Monitored service (to group SLOs by service).
      • feature_name: Monitored feature (to group SLOs by feature).
  • spec:

    • description: [required] string - Description of this SLO.
    • goal: [required] string - SLO goal (or target) (MUST be between 0 and 1).
    • backend: [required] string - Backend name (MUST exist in SLO Generator Configuration).
    • service_level_indicator: [required] map - SLI configuration. The content of this section is specific to each provider, see docs/providers.
    • error_budget_policy: [optional] string - Error budget policy name (MUST exist in SLO Generator Configuration). If not specified, defaults to default.
    • exporters: [optional] string - List of exporter names (MUST exist in SLO Generator Configuration).

Note: you can use environment variables in your SLO configs by using ${MY_ENV_VAR} syntax to avoid having sensitive data in version control. Environment variables will be replaced automatically at run time.

→ See example SLO configuration.

Shared configuration

The shared configuration (JSON or YAML) configures the slo-generator and acts as a shared config for all SLO configs. It is composed of the following fields:

  • backends: [required] map - Data backends configurations. Each backend alias is defined as a key <backend_name>/<suffix>, and a configuration map.

    backends:
      cloud_monitoring/dev:
        project_id: proj-cm-dev-a4b7
      datadog/test:
        app_key: ${APP_SECRET_KEY}
        api_key: ${API_SECRET_KEY}

    See specific providers documentation for detailed configuration:

  • exporters: A map of exporters to export results to. Each exporter is defined as a key formatted as <exporter_name>/<suffix>, and a map value detailing the exporter configuration.

    exporters:
      bigquery/dev:
        project_id: proj-bq-dev-a4b7
        dataset_id: my-test-dataset
        table_id: my-test-table
      prometheus/test:
        url: ${PROMETHEUS_URL}

    See specific providers documentation for detailed configuration:

    • pubsub to stream SLO reports.
    • bigquery to export SLO reports to BigQuery for historical analysis and DataStudio reporting.
    • cloud_monitoring to export metrics to Cloud Monitoring.
    • prometheus to export metrics to Prometheus.
    • datadog to export metrics to Datadog.
    • dynatrace to export metrics to Dynatrace.
    • <custom> to export SLO data or metrics to a custom destination.
  • error_budget_policies: [required] A map of various error budget policies.

    • <NAME>: Name of the error budget policy.
      • steps: List of error budget policy steps, each containing the following fields:
        • window: Rolling time window for this error budget.
        • alerting_burn_rate_threshold: Target burnrate threshold over which alerting is needed.
        • urgent_notification: boolean whether violating this error budget should trigger a page.
        • overburned_consequence_message: message to show when the error budget is above the target.
        • achieved_consequence_message: message to show when the error budget is within the target.
    error_budget_policies:
      default:
        steps:
        - name: 1 hour
          burn_rate_threshold: 9
          alert: true
          message_alert: Page to defend the SLO
          message_ok: Last hour on track
          window: 3600
        - name: 12 hours
          burn_rate_threshold: 3
          alert: true
          message_alert: Page to defend the SLO
          message_ok: Last 12 hours on track
          window: 43200

→ See example Shared configuration.

More documentation

To go further with the SLO Generator, you can read: