Skip to content

Commit

Permalink
Refactoring und made 1.9 compatible. (#4)
Browse files Browse the repository at this point in the history
* initial refactoring

* killing of basic tasks works

* Updates link for DC/OS CLI installation

* Updates marathon definition file and jobs decription to run rampage during business hours

* fixes bug with cron schedule

* prevents drax from killing himself

* prevents drax from killing frameworks

* adds json result output for /rampage

* enables /stats endpoint

* Updates README

* Adds configurable sleep time between the killing of tasks
  • Loading branch information
Jan Repnak authored and Joerg Schad committed Aug 3, 2017
1 parent 7f62539 commit 5bb72fd
Show file tree
Hide file tree
Showing 8 changed files with 215 additions and 251 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
# Folders
_obj
_test
.idea

# Architecture specific extensions/prefixes
*.[568vq]
Expand Down
65 changes: 14 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,15 @@

This is DRAX, the [DC/OS](https://dcos.io) Resilience Automated Xenodiagnosis tool. It helps to test DC/OS deployments by applying a [Chaos Monkey](http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html)-inspired, proactive and invasive testing approach.

![DRAX logo](img/drax-logo.png)

Well, actually DRAX is a reverse acronym inspired by the Guardians of the Galaxy character Drax the Destroyer.


You might have heard of Netflix's [Chaos Monkey](http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html) or it's containerized [variant](https://medium.com/production-ready/chaos-monkey-for-fun-and-profit-87e2f343db31). Maybe you've seen a [gaming version](https://www.wehkamplabs.com/blog/2016/06/02/docker-and-zombies/) of it or stumbled upon a [lower-level species](http://probablyfine.co.uk/2016/05/30/announcing-byte-monkey/). In any case I assume you're somewhat familiar with chaos-based resilience testing.

DRAX is a DC/OS-specific resilience testing tool that works mainly on the task-level. Future work may include node-level up to cluster-level.

## Installation and usage

Note that DRAX assumes a running [DC/OS 1.7](https://dcos.io/releases/1.7.0/) cluster.
Note that DRAX assumes a running [DC/OS 1.9](https://dcos.io/) cluster.

### Production

Expand All @@ -28,20 +25,25 @@ Now you can (modulo the public node of your cluster) do the following:
Content-Length: 10
Content-Type: application/javascript
Date: Mon, 13 Jun 2016 14:39:11 GMT

{"gone":0}

If you launched DRAX via Marathon, you can also trigger a POST to the /rampage continuously by deploying a DC/OS job. The example job is triggering the destruction every business hour from Monday till Friday:

$ dcos job add metronome-drax.json

### Testing and development

Get DRAX and build from source:

$ go get github.com/dcos-labs/drax
$ go build
$ MARATHON_URL=http://localhost:8080 ./drax
INFO[0000] Using Marathon at http://localhost:8080 main=init
INFO[0000] This is DRAX in version 0.4.0 main=init
INFO[0000] Listening on port 7777 main=init
INFO[0000] On destruction level 0 main=init
INFO[0000] Using Marathon at http://localhost:8080 main=init
INFO[0000] I will destroy 2 tasks on a rampage main=init
INFO[0000] This is DRAX in version 0.3.0 listening on port 7777

And in a different terminal session:

Expand All @@ -50,7 +52,7 @@ And in a different terminal session:
Content-Length: 10
Content-Type: application/javascript
Date: Mon, 13 Jun 2016 14:39:11 GMT

{"gone":0}

For Go development, be aware of the following dependencies (not using explicit vendoring ATM):
Expand All @@ -62,12 +64,6 @@ For Go development, be aware of the following dependencies (not using explicit v

Note that the following environment variables are pre-set in the [Marathon app spec](marathon-drax.json) and yours to overwrite.

#### Destruction level

You can influence the default destruction setting for DRAX via the env variable `DESTRUCTION_LEVEL`:

0 == destroy random tasks of any app
1 == destroy random tasks of specific app

#### Number of target tasks

Expand All @@ -77,13 +73,6 @@ To specify how many tasks DRAX is supposed to destroy in one rampage, use `NUM_T

To influence the log level, use the `LOG_LEVEL` env variable, for example `LOG_LEVEL=DEBUG drax` would give you fine-grained log messages (defaults to `INFO`).

### Roadmap

- add seeds (hello world dummy, NGINX, Marvin): shell script + DC/OS CLI and walkthrough examples
- Weave [Scope](https://www.weave.works/products/weave-scope/) demo
- tests, tutorial, blog post
- node/cluster level rampages

## API

### /health [GET]
Expand All @@ -92,50 +81,24 @@ Will return a HTTP 200 code and `I am Groot` if DRAX is healthy.

### /stats [GET]

Will return runtime statistics, such as killed containers or apps over a report period specified with the `runs` parameter. For example, `/stats?runs=2` will report over the past two runs and if the `runs` parameter is not or wrongly specified it will report from the beginning of time (well, beginning of time for DRAX anyways).
Will return runtime statistics, such as killed containers or apps and will report from the beginning of time (well, beginning of time for DRAX anyways).

$ http http://localhost:7777/stats
HTTP/1.1 200 OK
Content-Length: 10
Content-Type: application/javascript
Date: Mon, 13 Jun 2016 14:39:11 GMT

{"gone":2}

### /rampage [POST]

Will trigger a destruction run on a certain destruction level (see also configuration section above for the default value).

#### Target any (non-framework) app

To target any non-framework app, set the level of destruction (using the `level` parameter) to `0`, for example, `/rampage?level=0` will destroy random tasks of any apps.

Invoke with default level (any tasks in any app):
Will trigger a destruction. Invoke with:

$ http POST localhost:7777/rampage
HTTP/1.1 200 OK
Content-Length: 121
Content-Type: application/javascript
Date: Mon, 13 Jun 2016 12:15:19 GMT

{"success":true,"goners":["webserver.0fde0035-315f-11e6-aad0-1e9bbbc1653f","dummy.11a7c3bb-315f-11e6-aad0-1e9bbbc1653f"]}

#### Target a specific (non-framework) app

To target a specific (non-framework) app, set the level of destruction to `1` and specify the Marathon app id using the the `app` parameter. For example, `/rampage?level=1&app=dummy` will destroy random tasks of the app with the Marathon ID `/dummy`.

Invoke like so (to destroy tasks of app `/dummy`):

$ cat rp.json
{
"level" : "1",
"app" : "dummy"
}
$ http POST localhost:7777/rampage < rp.json
HTTP/1.1 200 OK
Content-Length: 117
Content-Type: application/javascript
Date: Mon, 13 Jun 2016 13:05:31 GMT

{"success":true,"goners":["dummy.59dca877-3165-11e6-aad0-1e9bbbc1653f","dummy.e96ffce3-3164-11e6-aad0-1e9bbbc1653f"]}

{"success":true,"goners":["webserver.0fde0035-315f-11e6-aad0-1e9bbbc1653f","dummy.11a7c3bb-315f-11e6-aad0-1e9bbbc1653f"]}
Loading

0 comments on commit 5bb72fd

Please sign in to comment.