Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.
/ monitor Public archive

A remote uptime monitoring framework for running monitors as a CRON job

License

Notifications You must be signed in to change notification settings

dollarshaveclub/monitor

Repository files navigation

@dollarshaveclub/monitor

CircleCI codecov Greenkeeper badge

A remote uptime monitoring framework for running monitors as a CRON job.

At Dollar Shave Club, we run some of our monitors using CircleCI 2 Scheduled Workflows. You can see the test/example monitors for this repository running every minute here: https://circleci.com/gh/dollarshaveclub/workflows/monitor/tree/master. See our CircleCI 2 Config.

Motivation

With this monitoring solution, we were able to:

  • Run our monitors every minute instead of every 5 minutes
  • Test both our API and Browser monitoring scripts outside of a monitoring platform's console/UI. We were unable to do this with our Terraform setup.
  • Use these monitoring scripts as tests for our version of Heroku Review Apps
    • Additionally, we can develop monitors alongside our features, allowing us to merge them simultaneously. We no longer have conflicts between our codebase and our monitors.
  • Able to easily create and manage hundreds of monitors, which is difficult with Terraform (excessive copy pasta) and any UI-based monitoring platform.

Some downsides to our CircleCI Scheduled Workflow setup are:

  • Contention with your other tests. If you reach your CircleCI 2 container limit, your monitors will queue then run in bursts, losing coverage momentarily.
  • May not be as fast as running monitors as Kubernetes jobs because CircleCI runs many commands like npm install on every build, which could be slower than just pulling a Docker container. However, having a CircleCI UI is preferable.

What about features other monitoring solutions provide?

  • We must setup our own dashboards and alerting
  • We still use other services for features we need, just not for monitoring everything
  • We don't need to run these monitors from multiple locations. If we do, we'll run them as Kubernetes jobs on different clusters.

Running Monitors

There are two ways to run these monitors.

Locally

To run monitors locally:

npx dsc-monitor 'monitors/**/*.js'

Run dsc-monitor --help for options.

NOTE: this assumes you've installed this library as a local dependency, which is installed as dsc-monitor. If you're running the monitors from this repository, use ./bin/run.js. If you've npm install --global @dollarshaveclub/monitor, just run dsc-monitor.

Via Docker

Copy our Dockerfile Template to your repository, then run:

docker build -t dsc-monitor
docker run -t dsc-monitor 'monitors/**/*.js'

Creating your Monitoring Repository

mkdir my-monitors # your repository name
cd my-monitors
npm init
npm i --save @dollarshaveclub/monitor
mkdir monitors
  1. Create a test monitor. You can use one of our example monitors.
  2. Add the npm run monitors command:
  3. Add the following script to your package.json: "monitors": "dsc-monitor 'monitors/**/*.js'"
  4. Run your monitors with npm run monitors
  5. Setting up your monitors in CircleCI as a CRON job:
  6. Copy .circleci/template.config.yml to .circleci/config.yml and push

Environment Variables

Monitor environment variables:

  • MONITOR_CONCURRENCY=1 - concurrency of monitors running at the same time
    • When concurrency === 1, results will stream to stdout
    • When concurrency >= 1, results will be logged one monitor set at a time
  • MONITOR_SHUFFLE - whether to shuffle monitors and monitor sets
  • MONITOR_SHUFFLE_MONITOR_SETS - whether to shuffle monitor sets
  • MONITOR_SHUFFLE_MONITORS - whether to shuffle monitors within a set

Defining Monitors

All monitoring sets are defined in monitors/. Each set is a module with:

  • exports.disabled<Boolean> = false - whether this monitor is disabled
  • exports.id<String> = __filename [optional] - an ID for your monitor set, defaulting to the filename
  • exports.slowThreshold<Number|String> = 30s [optional] - slow threshold for the entire monitor set
  • exports.parallelism<Number> = 1 [optional] - split this monitor set into shards and run in parallel
  • exports.monitors<Array> - an array of monitors with the following properties:
    • id<String> [required] - the ID of the monitor
    • parameters<Object> [optional] - parameters to send to the monitor function and for data purposes
    • monitor<Function>(monitorConfig, monitorSetConfig, { attempt, log }) [required] - the monitor function, which is passed this monitor object as well as exports
      • monitorConfig - this monitor object
      • monitorSetConfig - this exports object
      • attempt = 0 - the attempt # for this monitor
      • log(str) - a function to log in a nicely-formatted way
    • timeout<Number|String> = '5s' [optional] - timeout for the monitor before it's considered a failure
    • slowThreshold<Number|String> = '1s' [optional] - slow threshold for a monitor
    • retries<Number> = 0 [optional] - number of times to retry a failing monitor
  • Optional functions to run within the life cycle of the monitoring set:
    • exports.beforeAll<Function>(monitorSetConfig)
    • exports.afterAll<Function>(monitorSetConfig, result)
    • exports.beforeEach<Function>(monitorConfig, monitorSetConfig, { attempt, log })
    • exports.afterEach<Function>(monitorConfig, monitorSetConfig, { attempt, log })

What certain fields do:

  • slowThreshold - turns the color of the time from green to yellow when a monitor or set of monitors take this amount of time

Plugins

Create a file named dsc-monitor.js with the form:

module.exports = (monitorRunner) => {

}

Then pass it as a plugin (-p) when you run the monitors:

dsc-monitor -p dsc-monitor.js 'monitors/**/*.js'

Events

Hook into events via monitorRunner.events.on(<event>, callback). The events are:

  • monitorSet => (result) => {} - when a monitor set is completed
    • monitorSetConfig
    • results - array of monitor results
    • success = true|false
    • elapsedTime - in milliseconds
  • monitor => (result) => {} - when a monitor is completed
    • monitorSetConfig
    • monitorConfig
    • results - array of monitorAttempt results
    • success = true|false
    • elapsedTime - in milliseconds
  • monitorAttempt => (result) => {} - when a monitor attempt is completed
    • monitorSetConfig
    • monitorConfig
    • success = true|false
    • elapsedTime - in milliseconds
    • error - if an error occured
    • attempt = 1 - attempt #

Notes

Scheduling Monitors

CircleCI

See CircleCI 2 workflow scheduling: https://circleci.com/docs/2.0/workflows/#scheduling-a-workflow. You can work off our .circleci/config.yml template

See all builds on master of workflow monitor without a commit attached to it: https://circleci.com/gh/dollarshaveclub/monitor/tree/master Or just look at the monitor workflow: https://circleci.com/gh/dollarshaveclub/workflows/monitor/tree/master

About

A remote uptime monitoring framework for running monitors as a CRON job

Resources

License

Stars

Watchers

Forks

Packages

No packages published