Skip to content
This repository has been archived by the owner on Jul 2, 2024. It is now read-only.

Commit

Permalink
alert docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jac committed Apr 18, 2024
1 parent 85a16c1 commit bb04bcf
Show file tree
Hide file tree
Showing 14 changed files with 1,527 additions and 404 deletions.
123 changes: 100 additions & 23 deletions content/departments/engineering/managed-services/build-tracker.md

Large diffs are not rendered by default.

153 changes: 115 additions & 38 deletions content/departments/engineering/managed-services/cloud-ops.md

Large diffs are not rendered by default.

91 changes: 72 additions & 19 deletions content/departments/engineering/managed-services/cloud-relay.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
<!--
Generated documentation; DO NOT EDIT. Regenerate using this command: 'sg msp operations generate-handbook-pages'
Last updated: 2024-04-12 12:41:21.961433 +0000 UTC
Generated from: https://github.com/sourcegraph/managed-services/tree/cc51eaa4e11a3146ae0a173cc2b80076466df8f7
Last updated: 2024-04-18 13:41:36.626584 +0000 UTC
Generated from: https://github.com/sourcegraph/managed-services/tree/b48c02fa7c553af5b6888efff69b85b48717db54
-->

This document describes operational guidance for Cloud Relay infrastructure.
Expand All @@ -17,8 +17,8 @@ If you need assistance with MSP infrastructure, reach out to the [Core Services]

## Service overview

| PROPERTY | DETAILS |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------- |
| PROPERTY | DETAILS |
|--------------|------------------------------------------------------------------------------------------------------------------------------|
| Service ID | `cloud-relay` ([specification](https://github.com/sourcegraph/managed-services/blob/main/services/cloud-relay/service.yaml)) |
| Owners | **cloud** |
| Service kind | Cloud Run service |
Expand All @@ -30,22 +30,22 @@ If you need assistance with MSP infrastructure, reach out to the [Core Services]

### prod

| PROPERTY | DETAILS |
| ------------------- | ---------------------------------------------------------------------------------------------------- |
| Project ID | [`cloud-relay-prod-bd4c`](https://console.cloud.google.com/run?project=cloud-relay-prod-bd4c) |
| Category | **internal** |
| Deployment type | `manual` |
| Resources | |
| Slack notifications | [#alerts-cloud-relay-prod](https://sourcegraph.slack.com/archives/alerts-cloud-relay-prod) |
| Alerts | [GCP monitoring](https://console.cloud.google.com/monitoring/alerting?project=cloud-relay-prod-bd4c) |
| Errors | [Sentry `cloud-relay-prod`](https://sourcegraph.sentry.io/projects/cloud-relay-prod/) |
| Domain | [cloud-relay.sgdev.org](https://cloud-relay.sgdev.org) |
| Cloudflare WAF ||
| PROPERTY | DETAILS |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Project ID | [`cloud-relay-prod-bd4c`](https://console.cloud.google.com/run?project=cloud-relay-prod-bd4c) |
| Category | **internal** |
| Deployment type | `manual` |
| Resources | |
| Slack notifications | [#alerts-cloud-relay-prod](https://sourcegraph.slack.com/archives/alerts-cloud-relay-prod) |
| Alert policies | [Listing](https://console.cloud.google.com/monitoring/alerting/policies?project=cloud-relay-prod-bd4c), [Dashboard](https://console.cloud.google.com/monitoring/dashboards?pageState=%28%22dashboards%22%3A%28%22t%22%3A%22All%22%29%2C%22dashboardList%22%3A%28%22f%22%3A%22%255B%257B_22k_22_3A_22Type_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22Custom_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22category_22%257D%255D%22%29%29&project=cloud-relay-prod-bd4c) |
| Errors | [Sentry `cloud-relay-prod`](https://sourcegraph.sentry.io/projects/cloud-relay-prod/) |
| Domain | [cloud-relay.sgdev.org](https://cloud-relay.sgdev.org) |
| Cloudflare WAF | |

MSP infrastructure access needs to be requested using Entitle for time-bound privileges.

| ACCESS | ENTITLE REQUEST TEMPLATE |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ACCESS | ENTITLE REQUEST TEMPLATE |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GCP project read access | [Read-only Entitle request for the 'Internal Services' folder](https://app.entitle.io/request?data=eyJkdXJhdGlvbiI6IjEwODAwIiwianVzdGlmaWNhdGlvbiI6IkVOVEVSIEpVU1RJRklDQVRJT04gSEVSRSIsInJvbGVJZHMiOlt7ImlkIjoiNzg0M2MxYWYtYzU2MS00ZDMyLWE3ZTAtYjZkNjY0NDM4MzAzIiwidGhyb3VnaCI6Ijc4NDNjMWFmLWM1NjEtNGQzMi1hN2UwLWI2ZDY2NDQzODMwMyIsInR5cGUiOiJyb2xlIn1dfQ%3D%3D) |
| GCP project write access | [Write access Entitle request for the 'Internal Services' folder](https://app.entitle.io/request?data=eyJkdXJhdGlvbiI6IjEwODAwIiwianVzdGlmaWNhdGlvbiI6IkVOVEVSIEpVU1RJRklDQVRJT04gSEVSRSIsInJvbGVJZHMiOlt7ImlkIjoiZTEyYTJkZDktYzY1ZC00YzM0LTlmNDgtMzYzNTNkZmY0MDkyIiwidGhyb3VnaCI6ImUxMmEyZGQ5LWM2NWQtNGMzNC05ZjQ4LTM2MzUzZGZmNDA5MiIsInR5cGUiOiJyb2xlIn1dfQ%3D%3D) |

Expand All @@ -55,8 +55,8 @@ For Terraform Cloud access, see [prod Terraform Cloud](#prod-terraform-cloud).

The Cloud Relay prod service implementation is deployed on [Google Cloud Run](https://cloud.google.com/run).

| PROPERTY | DETAILS |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PROPERTY | DETAILS |
|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Console | [Cloud Run service](https://console.cloud.google.com/run?project=cloud-relay-prod-bd4c) |
| Service logs | [GCP logging](https://console.cloud.google.com/logs/query;query=resource.type%20%3D%20%22cloud_run_revision%22%20-logName%3D~%22logs%2Frun.googleapis.com%252Frequests%22;summaryFields=jsonPayload%252FInstrumentationScope,jsonPayload%252FBody,jsonPayload%252FAttributes%252Ferror:false:32:end?project=cloud-relay-prod-bd4c) |
| Service traces | [Cloud Trace](https://console.cloud.google.com/traces/list?project=cloud-relay-prod-bd4c) |
Expand Down Expand Up @@ -92,3 +92,56 @@ The Terraform Cloud workspaces for this service environment are [grouped under t
```bash
sg msp tfc view cloud-relay prod
```

### Alert Policies

The following alert policies are defined for each of this service's environments.

#### High Container CPU Utilization

```md
High CPU Usage - it may be neccessary to reduce load or increase CPU allocation
```

Severity: WARNING

#### High Container Memory Utilization

```md
High Memory Usage - it may be neccessary to reduce load or increase memory allocation
```

Severity: WARNING

#### Container Startup Latency

```md
Service containers are taking longer than configured timeouts to start up.
```

Severity: WARNING

#### Cloud Run Pending Requests

```md
There are requests pending - we may need to increase Cloud Run instance count, request concurrency, or investigate further.
```

Severity: WARNING

#### Cloud Run Instance Precondition Failed

```md
Cloud Run instance failed to start due to a precondition failure.
This is unlikely to cause immediate downtime, and may auto-resolve if no new instances are created and/or we return to a healthy state, but you should follow up to ensure the latest Cloud Run revision is healthy.
```

Severity: WARNING

#### External Uptime Check

```md
Service is failing to repond on https://cloud-relay.sgdev.org - this may be expected if the service was recently provisioned or if its external domain has changed.
```

Severity: CRITICAL
Loading

0 comments on commit bb04bcf

Please sign in to comment.