Skip to content

Commit

Permalink
315 dashboard titles (#316)
Browse files Browse the repository at this point in the history
* feat: update dashboard titles, uids and filenames

* refactor: correct mac prefix

* refactor: update shell scripts based on new artefact prefixes

* docs: spelling corrections

* refactor: add new words to dictionary

* docs: correct useage heading

* docs: correct installation headings

* fix: add mixin var to dashboard for loop

* refactor: add sre mac prefix to dashboard uris and alert names

* Add LocalStack integration with MaC (#306)

* Add inital docker compose file

* WIP

* Add inital docker compose file

* Initial documentation about LocalStack.

* Add some basic syntax highlighting.

* Make pre-commit hook executable.

* Update diagram.

* Update README.md

* Update README.md

* Update README.md

* Update diagram.

* Add static config to YACE.

* Add pre-requisite step for network creation.

Co-authored-by: Ariful Haque <[email protected]>

* Add trufflehog workflow to secret scan (#313)

* Add trufflehog workflow to secret scan

* Add pre-push git hook to scan code with trufflehog.

* Make pre-commit hook executable.

* Pin to version 3.14.0

Co-authored-by: Ariful Haque <[email protected]>

* Update git version in Docker image. (#319)

* fix: correct product dashboard uri

Co-authored-by: samiwelthomasHO <[email protected]>
Co-authored-by: Ariful Haque <[email protected]>
  • Loading branch information
3 people authored Oct 19, 2022
1 parent 676c020 commit 5e54545
Show file tree
Hide file tree
Showing 14 changed files with 139 additions and 81 deletions.
23 changes: 11 additions & 12 deletions docs/source/monitor-your-service/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,20 @@ weight: 40

# Monitor your service

Following the design principe of "Hierarchical dashboards with drill-downs to the next level" <sup>1</sup>, we have developed a five tier dashboard structure to fulfil different persona needs as follows: -
Following the design principle of "Hierarchical dashboards with drill-downs to the next level" <sup>1</sup>, we have developed a five tier dashboard structure to fulfil different persona needs as follows: -

| Dashboard | Description | Persona / User | Dashboard Name/s |
| ---------------------| -----------------------------------------------------------------|--------------------------|-----------------|
| All Tenant View | Observability of all services and tenants running on a platform. | Service Manager | summary-view |
| Service View | Observability of all products which makeup a specific service. | Product Manager and Team | {service name}-service-view |
| Product View | Observability of all the user journeys running on an individual product. | Product Manager and Team | {product name}-product-view |
| User Journey View | Observability of all the SLIs in a single user journey. | Product Manager and Team | {product name}-{journey name}-journey-view |
| Troubleshooting View | Observability of all whitebox and blackbox metrics which contribute to SLIs and Service Health. | Engineers | detail-view |
| Dashboard | Description | Persona / User | Dashboard Title |
| ---------------------| ---------------------------------------------------------------------------------------------------------------------|--------------------------|----------------------------------------------------------|
| Overview | Observability of all products and tenants running on a platform. | Service Manager | SRE MaC / Overview |
| Product View | Observability of all the user journeys running on an individual product. | Product Manager and Team | SRE MaC / {Product Name} |
| User Journey View | Observability of all the SLIs in a single user journey. | Product Manager and Team | SRE MaC / {Product Name} / {User Journey Name} |
| Detail View | Observability of all whitebox and blackbox metrics which contribute to SLIs and Service Health. For troubleshooting. | Engineers | detail-view | SRE MaC / {Product Name} / {User Journey Name} / Detail |

These hierarchical dashboards support a generic troubleshooting workflow: -

## Dashboard Design Principles

* Methodical dashboards according to an SLI/SLO stragegy.
* Methodical dashboards according to an SLI/SLO strategy.
* Hierarchical dashboards with drill-downs to the next level.
* Actively reduce sprawl.
* Regularly review existing dashboards to make sure they are still relevant.
Expand All @@ -29,9 +28,9 @@ These hierarchical dashboards support a generic troubleshooting workflow: -
* No editing in the browser. Dashboard viewers change views with variables.
* Browsing for dashboards is the exception, not the rule.
* Perform experimentation and testing on a feature branch (consider nonprod environment to be production).
* Expressive charts with meaningful use of color and normalizing axes where you can.
* Example of meaningful color: Blue means it’s good, red means it’s bad. Thresholds can help with that.
* Example of normalizing axes: When comparing CPU usage, measure by percentage rather than raw number, because machines can have a different number of cores. Normalizing CPU usage by the number of cores reduces cognitive load because the viewer can trust that at 100% all cores are being used, without having to know the number of CPUs.
* Expressive charts with meaningful use of colour and normalising axes where you can.
* Example of meaningful colour: Blue means it’s good, red means it’s bad. Thresholds can help with that.
* Example of normalising axes: When comparing CPU usage, measure by percentage rather than raw number, because machines can have a different number of cores. Normalising CPU usage by the number of cores reduces cognitive load because the viewer can trust that at 100% all cores are being used, without having to know the number of CPUs.



Expand Down
47 changes: 32 additions & 15 deletions monitoring-as-code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ Monitoring Mixins bundle up SLI configuration, Alerting, Grafana dashboards, and
- [docker](https://docs.docker.com)
- [git](https://git-scm.com)

## Docker installation
### Docker installation

See GitHub Releases page for most recent tagged version and pull the Docker image: -

`docker pull ghcr.io/ho-cto/sre-monitoring-as-code:{tag}`

## GitHub clone installation
### GitHub clone installation

**In a directory of your choosing run the following setup commands.**

Expand All @@ -42,16 +42,7 @@ docker build -t sre-monitoring-as-code:latest .

## Usage

### Default mixin config

**To run the default monitoring and summary mixins bundles into the built container run the following command**

```
# Execute makefile script
sh deploy.sh
```

### Custom mixin
### Docker Run using custom mixin configuration

**To run a custom mixin file**

Expand All @@ -60,11 +51,9 @@ sh deploy.sh
touch grapi-mixin.jsonnet
# Execute docker run command based on mounted directory where the mixin file has been added.
docker run --mount type=bind,source="$PWD"/{user input directory},target=/input --mount type=bind,source="$PWD"/{user output directory},target=/output -it sre-monitoring-as-code:{tag} -m {service} -rd -i input -o output
docker run --mount type=bind,source="$PWD"/{user input directory},target=/input --mount type=bind,source="$PWD"/{user output directory},target=/output -it sre-monitoring-as-code:{tag} -m {service} -rd -i input -o output {namespace:- defaults to localhost if not supplied}
```

### Configuration Arguments

**Arguments to be passed to container at runtime**

| Argument | Description |
Expand All @@ -76,6 +65,34 @@ docker run --mount type=bind,source="$PWD"/{user input directory},target=/input
| -r | Include if you only want to generate Prometheus rules, both generated if neither included |
| -d | Include if you only want to generate Grafana dashboards, both generated if neither included |

### Execute built-in shell script from cloned repository

**A default set of mixin configuration files are supplied in the repository and built container, these are: -**

| Mixin | Description |
|------------|-------------------------------------------------------------------------------------------|
| overview | Provides a summary dashboard of all consumers of MaC on a platform |
| monitoring | Provides monitoring of the Prometheus/Grafana eco-system on which MaC runs |
| generic | A global distribution of MaC using a set of 5 golden SLIs aggregated to the product level |
| testing | A mixin containing all MaC metric types and SLI libraries using for pipeline code tests |


**Distribute to local monitoring stack**

A set of arguments are supplied in `deploy.sh` which allow you to distribute generated artefacts (dashboards and rules) to the monitoring local stack provided in this repo.

```
TRANSFER_RULES="true"
TRANSFER_DASHBOARDS="true"
```

**To run these default mixins execute the following command**

```
# Execute makefile script
sh deploy.sh
```

## Distribution

### Add artefacts (dashboard, alerts rules and recording rules) to Grafana and Prometheus package management tooling (Prometheus Operator)
Expand Down
43 changes: 31 additions & 12 deletions monitoring-as-code/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,44 @@
# Global variables
RULES_DIRECTORY="$PWD"/output/prometheus-rules/
DASHBOARD_DIRECTORY="$PWD"/output/grafana-dashboards/
TRANSFER_RULES="true"
TRANSFER_DASHBOARDS="true"
LOCAL_PATH="$PWD"/../local/

# Clear down MaC output directory
rm -rf "$PWD"/output/*/

#Executes docker image to create rules and dashboards for monitoring and summary mixin files
docker run --mount type=bind,source="$PWD"/output,target=/output --mount type=bind,source="$PWD"/mixin-defs,target=/input -it sre-monitoring-as-code:latest -m monitoring -rd -i input -o output
# Set array of mixins which will be executed
set -- overview generic monitoring testing

docker run --mount type=bind,source="$PWD"/output,target=/output --mount type=bind,source="$PWD"/mixin-defs,target=/input -it sre-monitoring-as-code:latest -m summary -d -i input -o output
# Loop through mixin array
for mixin in "$@";
do

# Copy Prometheus rules to monitoring local
cp -a "$RULES_DIRECTORY"/. "$LOCAL_PATH"/prometheus/rule_configs
if [ "$mixin" = "overview" ]; then
#Executes docker image to create dashboards for overview mixin
docker run --mount type=bind,source="$PWD"/output,target=/output --mount type=bind,source="$PWD"/mixin-defs,target=/input -it sre-monitoring-as-code:latest -m "$mixin" -d -i input -o output;
else
#Executes docker image to create dashboards and rules for all mixins other than overview
docker run --mount type=bind,source="$PWD"/output,target=/output --mount type=bind,source="$PWD"/mixin-defs,target=/input -it sre-monitoring-as-code:latest -m "$mixin" -rd -i input -o output;
fi

# Copy Grafana dashboards to monitoring local
for dashboard_file_path in "$DASHBOARD_DIRECTORY"/*
do
dashboard_file="${dashboard_file_path##*/}"
mixin="${dashboard_file%%\-*}"
# Copy Grafana dashboards to monitoring local
if [ "$TRANSFER_DASHBOARDS" = "true" ]; then
for dashboard_file_path in "$DASHBOARD_DIRECTORY"/*"$mixin"*
do
dashboard_file="${dashboard_file_path##*/}"

mkdir -p "$LOCAL_PATH"/grafana/provisioning/dashboards/"$mixin"
cp "$DASHBOARD_DIRECTORY"/"$dashboard_file" "$LOCAL_PATH"/grafana/provisioning/dashboards/"$mixin"/"$dashboard_file"
done
fi

mkdir -p "$LOCAL_PATH"/grafana/provisioning/dashboards/"$mixin"
cp "$DASHBOARD_DIRECTORY"/"$dashboard_file" "$LOCAL_PATH"/grafana/provisioning/dashboards/"$mixin"/"$dashboard_file"
done

## Copy Prometheus rules to monitoring local
if [ "$TRANSFER_RULES" = "true" ]; then
cp -a "$RULES_DIRECTORY"/. "$LOCAL_PATH"/prometheus/rule_configs
fi


Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ local config = {
macVersion: std.extVar('MAC_VERSION'),
};

mixinFunctions.createSummaryDashboard(config)
mixinFunctions.createOverviewDashboard(config)
4 changes: 2 additions & 2 deletions monitoring-as-code/run-mixin.sh
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,11 @@ if [ "$generate_rules" = "true" ]; then
for environment in $environments
do
# Generate Prometheus recording rules YAML
if ! jsonnet -J vendor --ext-str ENV="$environment" --ext-str ACCOUNT="$account" --ext-str MAC_VERSION="$MAC_VERSION" --ext-str CUSTOM_METRIC_TYPES="$CUSTOM_METRIC_TYPES" -S -e "std.manifestYamlDoc((import \"${PWD}/_input/mixin.jsonnet\").prometheusRules)" > "$PWD"/_output/prometheus-rules/"$mixin"-"$environment"-recording-rules.yaml;
if ! jsonnet -J vendor --ext-str ENV="$environment" --ext-str ACCOUNT="$account" --ext-str MAC_VERSION="$MAC_VERSION" --ext-str CUSTOM_METRIC_TYPES="$CUSTOM_METRIC_TYPES" -S -e "std.manifestYamlDoc((import \"${PWD}/_input/mixin.jsonnet\").prometheusRules)" > "$PWD"/_output/prometheus-rules/sre-mac-"$mixin"-"$environment"-recording-rules.yaml;
then echo "Failed to run recording rules for ${mixin} (environment ${environment}) - exiting"; exit; fi

# Generate Prometheus alert rules YAML
if ! jsonnet -J vendor --ext-str ENV="$environment" --ext-str ACCOUNT="$account" --ext-str MAC_VERSION="$MAC_VERSION" --ext-str CUSTOM_METRIC_TYPES="$CUSTOM_METRIC_TYPES" -S -e "std.manifestYamlDoc((import \"${PWD}/_input/mixin.jsonnet\").prometheusAlerts)" > "$PWD"/_output/prometheus-rules/"$mixin"-"$environment"-alert-rules.yaml;
if ! jsonnet -J vendor --ext-str ENV="$environment" --ext-str ACCOUNT="$account" --ext-str MAC_VERSION="$MAC_VERSION" --ext-str CUSTOM_METRIC_TYPES="$CUSTOM_METRIC_TYPES" -S -e "std.manifestYamlDoc((import \"${PWD}/_input/mixin.jsonnet\").prometheusAlerts)" > "$PWD"/_output/prometheus-rules/sre-mac-"$mixin"-"$environment"-alert-rules.yaml;
then echo "Failed to run alert rules for ${mixin} (environment ${environment}) - exiting"; exit; fi

done
Expand Down
8 changes: 5 additions & 3 deletions monitoring-as-code/src/alerts/burn-rate-alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ local createBurnRateAlerts(config, sliSpec, sliKey, journeyKey) =
{
alerts+: [
{
local alertName = std.join('_', [config.product, journeyKey, sliKey, sliSpec.sliType, 'ErrorBudgetBurn']),
local alertName = std.join('_', [std.strReplace(macConfig.macDashboardPrefix.uid, '-', '_'), config.product, journeyKey, sliKey, sliSpec.sliType, 'ErrorBudgetBurn']),
local severity = getSeverity(errorBudgetBurnWindow, config, sliSpec),
local alertTitle = createAlertTitle(errorBudgetBurnWindow, config, sliSpec, sliKey, journeyKey),

Expand Down Expand Up @@ -134,12 +134,14 @@ local createBurnRateAlerts(config, sliSpec, sliKey, journeyKey) =
annotations: {
dashboard: '%(grafanaUrl)s/d/%(journeyUid)s%(environment)s' % {
grafanaUrl: config.grafanaUrl,
journeyUid: std.join('-', [config.product, journeyKey, 'journey-view']),
journeyUid: std.join('-', [macConfig.macDashboardPrefix.uid, config.product, journeyKey]),
environment: if std.objectHas(config, 'generic') && config.generic then '' else '?var-environment=%s' % config.environment,
},
silenceurl: '%(alertmanagerUrl)s/#/silences/new?filter={alertname%%3D%%22%(alertName)s%%22}' % {
silenceurl: '%(alertmanagerUrl)s/#/silences/new?filter={alertname%%3D%%22%(alertName)s%%22, journey%%3D%%22%(journey)s%%22, service%%3D%%22%(service)s%%22}' % {
alertmanagerUrl: config.alertmanagerUrl,
alertName: alertName,
journey: journeyKey,
service: config.product,
},
description: createAlertPayloadString(alertPayload),
[if std.objectHas(config, 'runbookUrl') then 'runbookUrl']:
Expand Down
9 changes: 3 additions & 6 deletions monitoring-as-code/src/dashboards/detail-dashboard.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -298,11 +298,8 @@ local createDetailDashboard(journeyKey, config, links, sliSpecList) =
);

dashboard.new(
title='%(product)s-%(journey)s-detail-view' % {
product: config.product,
journey: journeyKey,
},
uid=std.join('-', [config.product, journeyKey, 'detail-view']),
title=stringFormattingFunctions.capitaliseFirstLetters(std.join(' / ', [macConfig.macDashboardPrefix.title, config.product, journeyKey, 'detail'])),
uid=std.join('-', [macConfig.macDashboardPrefix.uid, config.product, journeyKey, 'detail']),
tags=[config.product, 'mac-version: %s' % config.macVersion, journeyKey, 'detail-view'],
schemaVersion=18,
editable=true,
Expand All @@ -329,7 +326,7 @@ local createDetailDashboard(journeyKey, config, links, sliSpecList) =
// @returns JSON for the detail dashboards
local createDetailDashboards(config, links, sliSpecList) =
{
[std.join('-', [config.product, journeyKey, 'detail-view.json'])]:
[std.join('-', [macConfig.macDashboardPrefix.uid, config.product, journeyKey, 'detail']) + '.json']:
createDetailDashboard(journeyKey, config, links, sliSpecList)
for journeyKey in std.objectFields(sliSpecList)
};
Expand Down
13 changes: 7 additions & 6 deletions monitoring-as-code/src/dashboards/journey-dashboard.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ local grafana = import 'grafonnet/grafana.libsonnet';
local dashboard = grafana.dashboard;
local template = grafana.template;

// MaC imports
local macConfig = import '../mac-config.libsonnet';
local stringFormattingFunctions = import '../util/string-formatting-functions.libsonnet';

// Create the Grafana panels grouping all SLI types under a single SLI panel
// @param slis A map of SLIs keyed by the SLI type
// @returns array of panel elements
Expand Down Expand Up @@ -54,13 +58,10 @@ local createDashboardInfo(sliKey, slis) =
// @returns JSON defining the journey view dashboards for a service
local createJourneyDashboards(config, sliList, links) =
{
[std.join('-', [config.product, journeyKey, 'journey-view.json'])]:
[std.join('-', [macConfig.macDashboardPrefix.uid, config.product, journeyKey]) + '.json']:
dashboard.new(
title='%(product)s-%(journey)s-journey-view' % {
product: config.product,
journey: journeyKey,
},
uid=std.join('-', [config.product, journeyKey, 'journey-view']),
title=stringFormattingFunctions.capitaliseFirstLetters(std.join(' / ', [macConfig.macDashboardPrefix.title, config.product, journeyKey])),
uid=std.join('-', [macConfig.macDashboardPrefix.uid, config.product, journeyKey]),
tags=[config.product, 'mac-version: %s' % config.macVersion, journeyKey, 'journey-view'],
schemaVersion=18,
editable=true,
Expand Down
Loading

0 comments on commit 5e54545

Please sign in to comment.