Skip to content

Commit

Permalink
adds Marathon app spec, enhances API
Browse files Browse the repository at this point in the history
  • Loading branch information
mhausenblas committed Jun 13, 2016
1 parent 85b0f6b commit 10e9610
Show file tree
Hide file tree
Showing 5 changed files with 160 additions and 72 deletions.
55 changes: 26 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ From source, which will get you always the latest version:

Via Marathon app spec:

$ TBD
$ dcos marathon app add marathon-drax.json

### Dependencies

Expand All @@ -31,11 +31,13 @@ Via Marathon app spec:
You can influence what DRAX is supposed to destroy via the env variable `DESTRUCTION_LEVEL`:

0 == destroy random tasks
1 == destroy random apps
1 == destroy random task of specific app
2 == destroy random apps and services

So for example you want DRAX to totally go berserk, use this to launch it from the command line: `DESTRUCTION_LEVEL=2 drax`.

Next, you can influence how many tasks DRAX is supposed to destroy in one rampage via the env variable `NUM_TARGETS`, for example `NUM_TARGETS=5 drax` means that (up to) 5 tasks will be destroyed, unless the overall number of tasks is less, of course.

Further, in order to influence the log level, use the `LOG_LEVEL` env variable, for example `LOG_LEVEL=DEBUG drax` would give you fine-grained log messages.

## API
Expand All @@ -50,39 +52,34 @@ Will return runtime statistics, such as killed containers or apps over a report

### /rampage [POST]

Will trigger a destruction run on the current destruction level (see configuration section, above). You can explicitly set the level of destruction using the `level` parameter, for example, `/rampage?level=1` will destroy random apps (but no services/frameworks).
Will trigger a destruction run on a certain destruction level (see also configuration section above for the default value).

#### Target any (non-framework) app

To target any non-framework app, set the level of destruction (using the `level` parameter) to `0`, for example, `/rampage?level=0` will destroy random tasks of any apps.

Test locally:
To test it locally, run:

$ MARATHON_URL=http://localhost:8080 drax

And:
And invoke with default level (any taks on any app):

$ http POST localhost:7777/rampage
HTTP/1.1 200 OK
Content-Length: 1187
Content-Type: text/plain; charset=utf-8
Date: Mon, 13 Jun 2016 06:30:47 GMT
Content-Length: 121
Content-Type: application/javascript
Date: Mon, 13 Jun 2016 12:15:19 GMT

Application: /weavescope is healthy: true
Task: weavescope.d0daf569-2cb2-11e6-aad0-1e9bbbc1653f
Application: /marvin/osmlookup is healthy: true
Task: marvin_osmlookup.dc1bdbfc-2cb3-11e6-aad0-1e9bbbc1653f
Application: /marvin/go2 is healthy: true
Task: marvin_go2.b87c98ba-2cb3-11e6-aad0-1e9bbbc1653f
Application: /marvin/frontend is healthy: true
Task: marvin_frontend.45c125be-2cb4-11e6-aad0-1e9bbbc1653f
Application: /weavescope-probe is healthy: true
Task: weavescope-probe.98789a15-2cb2-11e6-aad0-1e9bbbc1653f
Task: weavescope-probe.98793657-2cb2-11e6-aad0-1e9bbbc1653f
Task: weavescope-probe.98795d68-2cb2-11e6-aad0-1e9bbbc1653f
Task: weavescope-probe.9878e836-2cb2-11e6-aad0-1e9bbbc1653f
Application: /jenkins is healthy: true
Task: jenkins.773bfef3-2cb2-11e6-aad0-1e9bbbc1653f
Application: /webserver is healthy: true
Task: webserver.8d19ef26-2d66-11e6-aad0-1e9bbbc1653f
Application: /marvin/events is healthy: true
Task: marvin_events.d529728b-2cb3-11e6-aad0-1e9bbbc1653f
Application: /marvin/rec is healthy: true
Task: marvin_rec.e0d608ad-2cb3-11e6-aad0-1e9bbbc1653f
{"success":true,"goners":["webserver.0fde0035-315f-11e6-aad0-1e9bbbc1653f","dummy.11a7c3bb-315f-11e6-aad0-1e9bbbc1653f"]}

#### Target a specific (non-framework) app

To target a specific (non-framework) app, set the level of destruction to `1` and specify the Marathon app id using the the `app` parameter. For example, `/rampage?level=1&app=dummy` will destroy random tasks of the app with the Marathon ID `/dummy`.

To test it locally, run:

$ MARATHON_URL=http://localhost:8080 drax

And invoke like so (to destroy tasks of app `/dummy`)

$ http -f POST localhost:7777/rampage -- level=1 app=dummy
130 changes: 94 additions & 36 deletions api.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"math/rand"
"net/http"
"strconv"
"strings"
"sync/atomic"
)

Expand Down Expand Up @@ -47,32 +48,43 @@ func (n NOUN_Stats) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// Handles /rampage API calls
func (n NOUN_Rampage) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if r.Method == "POST" {
// extract $LEVEL parameter from /rampage?level=$LEVEL in the following:
levelParam := r.URL.Query().Get("level")
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Level param = ", levelParam)
if level, err := strconv.Atoi(levelParam); err == nil {
destructionLevel = DestructionLevel(level)
}
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Starting rampage on destruction level ", destructionLevel)

switch destructionLevel {
case DL_BASIC:
killTasks(w, r)
case DL_ADVANCED:
fmt.Fprint(w, "not yet implemented")
case DL_ALL:
fmt.Fprint(w, "not yet implemented")
default:
http.NotFound(w, r)
err := r.ParseForm()
if err != nil {
http.Error(w, "Can't parse rampage params", 500)
} else {
levelParam := r.Form.Get("level")
if levelParam != "" {
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Got level param ", levelParam)
if level, err := strconv.Atoi(levelParam); err == nil {
destructionLevel = DestructionLevel(level)
}
}
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Starting rampage on destruction level ", destructionLevel)
switch destructionLevel {
case DL_BASIC:
killTasks(w, r)
case DL_ADVANCED:
appParam := r.Form.Get("app")
if appParam != "" {
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Got app param ", appParam)
killTasksOfApp(w, r, appParam)
} else {
http.NotFound(w, r)
}
case DL_ALL:
fmt.Fprint(w, "not yet implemented")
default:
http.NotFound(w, r)
}
}
} else {
log.WithFields(log.Fields{"handle": "/rampage"}).Error("Only POST method supported")
http.NotFound(w, r)
}
}

// killTasks will identify tasks from apps (not framework service)
// to be killed and randomly kill off a few of them
// killTasks will identify tasks of any apps (but not framework services)
// and randomly kill off a few of them
func killTasks(w http.ResponseWriter, r *http.Request) {
if client, ok := getClient(); ok {
nonFrameworkApps := 0
Expand All @@ -84,11 +96,10 @@ func killTasks(w http.ResponseWriter, r *http.Request) {
}
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Found overall ", len(apps.Apps), " applications running")
candidates := []string{}
rr := &RampageResult{}
for _, app := range apps.Apps {
log.WithFields(log.Fields{"handle": "/rampage"}).Debug("APP ", app.ID)
details, _ := client.Application(app.ID)
if !isFramework(details) {
if !myself(details) && !isFramework(details) {
nonFrameworkApps++
if details.Tasks != nil && len(details.Tasks) > 0 {
for _, task := range details.Tasks {
Expand All @@ -98,28 +109,65 @@ func killTasks(w http.ResponseWriter, r *http.Request) {
}
}
}
if len(candidates) > 0 {
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Found ", len(candidates), " non-framework tasks in ", nonFrameworkApps, " apps to kill")
// pick one random task to be killed
candidate := candidates[rand.Intn(len(candidates))]
rr.Success = killTask(client, candidate)
if rr.Success {
rr.TaskList = []string{candidate}
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Killed tasks ", rr.TaskList)
rampage(w, client, nonFrameworkApps, candidates)
} else {
http.Error(w, "Can't connect to Marathon", 500)
}
}

// killTasks will identify tasks of a specific app defined by targetAppID
// and randomly kill off a few of them
func killTasksOfApp(w http.ResponseWriter, r *http.Request, targetAppID string) {
if client, ok := getClient(); ok {
candidates := []string{}
details, _ := client.Application(targetAppID)
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Found app ", details.ID, " running")
if !myself(details) && !isFramework(details) {
if details.Tasks != nil && len(details.Tasks) > 0 {
for _, task := range details.Tasks {
log.WithFields(log.Fields{"handle": "/rampage"}).Debug("TASK ", task.ID)
candidates = append(candidates, task.ID)
}
}
} else {
rr.Success = false
}
jsonrr, _ := json.Marshal(rr)
w.Header().Set("Content-Type", "application/javascript")
fmt.Fprint(w, string(jsonrr))
rampage(w, client, 1, candidates)
} else {
http.Error(w, "Can't connect to Marathon", 500)
}
}

// killTask kills a certain task and increments
// the overall count if successful
// rampage kills random tasks from the candidates and returns a JSON result
func rampage(w http.ResponseWriter, c marathon.Marathon, numApps int, candidates []string) {
rr := &RampageResult{}
rr.TaskList = []string{}
targets := []int{}
if len(candidates) > 0 {
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Found ", len(candidates), " tasks in ", numApps, " apps to kill")
// generates a list of random, non-repeating indices into the candidates:
if len(candidates) > numTargets {
targets = rand.Perm(len(candidates))[:numTargets]
} else {
targets = rand.Perm(len(candidates))
}
for _, t := range targets {
candidate := candidates[t]
rr.Success = killTask(c, candidate)
if rr.Success {
rr.TaskList = append(rr.TaskList, candidate)
}
}
log.WithFields(log.Fields{"handle": "/rampage"}).Info("Killed tasks ", rr.TaskList)
// at least killed some tasks, so consider it a success:
rr.Success = true
} else {
rr.Success = false
}
jsonrr, _ := json.Marshal(rr)
w.Header().Set("Content-Type", "application/javascript")
fmt.Fprint(w, string(jsonrr))
}

// killTask kills a certain task and increments overall count if successful
func killTask(c marathon.Marathon, taskID string) bool {
_, err := c.KillTask(taskID, nil)
if err != nil {
Expand All @@ -132,6 +180,15 @@ func killTask(c marathon.Marathon, taskID string) bool {
}
}

// myself returns true if it is applied to DRAX Marathon app itself
func myself(app *marathon.Application) bool {
if strings.Contains(app.ID, "drax") {
return true
} else {
return false
}
}

// isFramework returns true if the Marathon app is a service framework,
// and false otherwise (determined via the DCOS_PACKAGE_IS_FRAMEWORK label key)
func isFramework(app *marathon.Application) bool {
Expand All @@ -156,6 +213,7 @@ func getClient() (marathon.Marathon, bool) {
return client, true
}

// incTasksKilled increases the overall tasks killed counter in an atomic way
func incTasksKilled() {
atomic.AddUint64(&overallTasksKilled, 1)
}
16 changes: 10 additions & 6 deletions doc.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
Package DRAX (DC/OS Resilience Automated Xenodiagnosis) implements a
chaosmonkey-like testing functionality for DC/OS clusters.
It provides the following functionality:
To launch it locally use the following:
$ MARATHON_URL=http://localhost:8080 drax
To launch it into the cluster (via the DC/OS CLI) use:
$ dcos marathon app add marathon-drax.json
You can then use the API to destroy tasks and check stats,
see https://github.com/dcos-labs/drax#api for details.
- Via the environment variable DESTRUCTION_LEVEL the destruction level
is set, with 0 == destroy random tasks, 1 == destroy random apps, and
2 == destroy random apps and services.
- It will expose metrics via the `/stats` endpoint.
- It will expose health status via the `/health` endpoint.
*/
package main
11 changes: 10 additions & 1 deletion main.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@ type DestructionLevel int

const (
// DRAX version
VERSION string = "0.2.0"
VERSION string = "0.3.0"
// The IP port DRAX is listening on
DRAX_PORT int = 7777
// The number of tasks to kill
DEFAULT_NUM_TARGETS int = 2
)

const (
Expand All @@ -32,6 +34,7 @@ var (
mux *http.ServeMux
marathonURL string
destructionLevel DestructionLevel = DL_BASIC
numTargets int = DEFAULT_NUM_TARGETS
overallTasksKilled uint64
)

Expand All @@ -51,6 +54,12 @@ func init() {
}
log.WithFields(log.Fields{"main": "init"}).Info("On destruction level ", destructionLevel)

if nt := os.Getenv("NUM_TARGETS"); nt != "" {
n, _ := strconv.Atoi(nt)
numTargets = n
}
log.WithFields(log.Fields{"main": "init"}).Info("I will destroy ", numTargets, " tasks on a rampage")

if ll := os.Getenv("LOG_LEVEL"); ll != "" {
switch strings.ToUpper(ll) {
case "DEBUG":
Expand Down
20 changes: 20 additions & 0 deletions marathon-drax.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"id": "drax",
"cmd": "chmod u+x drax && ./drax",
"cpus": 0.1,
"mem": 200,
"ports": [
0
],
"uris": [
"https://github.com/dcos-labs/drax/releases/download/0.3.0/drax"
],
"env": {
"LOG_LEVEL": "DEBUG",
"DESTRUCTION_LEVEL": "0",
"NUM_TARGETS": "3"
},
"acceptedResourceRoles": [
"slave_public"
]
}

0 comments on commit 10e9610

Please sign in to comment.