293 shadowmode (#294)

* Global ShadowMod Signed-off-by: Jesper Söderlund <[email protected]> * Adding shadow-mode issue #293 Signed-off-by: Jesper Söderlund <[email protected]> * Fix doc format Signed-off-by: Jesper Söderlund <[email protected]> * Extended example to showcase shadow mode as well, build a docker image with the current rate limit, extend envoy config to display rate limit headers. Extend statsd to show shadow mode Signed-off-by: Jesper Söderlund <[email protected]> * Fine tuning of docs and example Signed-off-by: Jesper Söderlund <[email protected]> * Added integration tests, fixed some review comments Signed-off-by: Jesper Söderlund <[email protected]> * Doc format fix Signed-off-by: Jesper Söderlund <[email protected]> * gitignore for vscode Signed-off-by: Jesper Söderlund <[email protected]> Co-authored-by: Jesper Söderlund <[email protected]>
envoyproxy · Oct 5, 2021 · 1c1d46e · 1c1d46e
1 parent 35b6056
commit 1c1d46e
Show file tree

Hide file tree

Showing 30 changed files with 871 additions and 101 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,6 +2,7 @@ cover.out
 
 bin/
 .idea/
+.vscode/
 vendor
 cert.pem
 key.pem

diff --git a/Makefile b/Makefile
@@ -113,3 +113,9 @@ docker_image: docker_tests
 .PHONY: docker_push
 docker_push: docker_image
 	docker push $(IMAGE):$(VERSION)
+
+.PHONY: integration-tests
+integration-tests:
+	docker-compose --project-dir $(PWD)  -f integration-test/docker-compose-integration-test.yml up --build  --exit-code-from tester
+
+# docker-compose --project-dir $(PWD)  -f integration-test/docker-compose-integration-test.yml up --build  --exit-code-from tester
diff --git a/README.md b/README.md
@@ -9,24 +9,30 @@
 - [Building and Testing](#building-and-testing)
   - [Docker-compose setup](#docker-compose-setup)
   - [Full test environment](#full-test-environment)
+  - [Self-contained end-to-end integration test](#self-contained-end-to-end-integration-test)
 - [Configuration](#configuration)
   - [The configuration format](#the-configuration-format)
     - [Definitions](#definitions)
     - [Descriptor list definition](#descriptor-list-definition)
     - [Rate limit definition](#rate-limit-definition)
+    - [ShadowMode](#shadowmode)
     - [Examples](#examples)
       - [Example 1](#example-1)
       - [Example 2](#example-2)
       - [Example 3](#example-3)
       - [Example 4](#example-4)
       - [Example 5](#example-5)
+    - [Example 6](#example-6)
   - [Loading Configuration](#loading-configuration)
   - [Log Format](#log-format)
   - [GRPC Keepalive](#grpc-keepalive)
 - [Request Fields](#request-fields)
 - [GRPC Client](#grpc-client)
   - [Commandline flags](#commandline-flags)
-- [Statistics](#statistics)
+- [Global ShadowMode](#global-shadowmode)
+  - [Configuration](#configuration-1)
+  - [Statistics](#statistics)
+- [Statistics](#statistics-1)
   - [Statistics options](#statistics-options)
 - [HTTP Port](#http-port)
   - [/json endpoint](#json-endpoint)
@@ -120,20 +126,40 @@ as explained in the [two redis instances](#two-redis-instances) section.
 ## Full test environment
 To run a fully configured environment to demo Envoy based rate limiting, run:
 ```bash
-docker-compose -f docker-compose-example.yml up
+docker-compose -f docker-compose-example.yml up --build --remove-orphans
 ```
 This will run ratelimit, redis, prom-statsd-exporter and two Envoy containers such that you can demo rate limiting by hitting the below endpoints.
 ```bash
 curl localhost:8888/test
 curl localhost:8888/header -H "foo: foo" # Header based
 curl localhost:8888/twoheader -H "foo: foo" -H "bar: bar" # Two headers
-curl localhost:8888/twoheader -H "foo: foo" -H "baz: baz"
+curl localhost:8888/twoheader -H "foo: foo" -H "baz: baz"  # This will be rate limited
 curl localhost:8888/twoheader -H "foo: foo" -H "bar: banned" # Ban a particular header value
+curl localhost:8888/twoheader -H "foo: foo" -H "baz: shady" # This will never be ratelimited since "baz" with value "shady" is in shadow_mode
+curl localhost:8888/twoheader -H "foo: foo" -H "baz: not-so-shady" # This is subject to rate-limiting because the it's now in shadow_mode
 ```
 Edit `examples/ratelimit/config/example.yaml` to test different rate limit configs. Hot reloading is enabled.
 
 The descriptors in `example.yaml` and the actions in `examples/envoy/proxy.yaml` should give you a good idea on how to configure rate limits.
 
+To see the metrics in the example
+```bash
+# The metrics for the shadow_mode keys
+curl http://localhost:9102/metrics | grep -i shadow
+```
+
+## Self-contained end-to-end integration test
+
+Integration tests are coded as bash-scripts in `integration-test/scripts`.
+
+The test suite will spin up a docker-compose environment from `integration-test/docker-compose-integration-test.yml`
+
+If the test suite fails it will exit with code 1.
+
+```bash
+make integration-tests
+```
+
 # Configuration
 
 ## The configuration format
@@ -163,6 +189,7 @@ descriptors:
     rate_limit: (optional block)
       unit: <see below: required>
       requests_per_unit: <see below: required>
+    shadow_mode: (optional)
     descriptors: (optional block)
       - ... (nested repetition of above)
 ```
@@ -184,6 +211,15 @@ The rate limit block specifies the actual rate limit that will be used when ther
 Currently the service supports per second, minute, hour, and day limits. More types of limits may be added in the
 future based on user demand.
 
+### ShadowMode
+A shadow_mode key in a rule indicates that whatever the outcome of the evaluation of the rule, the end-result will always be "OK".
+
+When a block is in ShadowMode all functions of the rate limiting service are executed as normal, with cache-lookup and statistics
+
+An additional statistic is added to keep track of how many times a key with "shadow_mode" has overridden result.
+
+There is also a Global Shadow Mode
+
 ### Examples
 
 #### Example 1
@@ -351,6 +387,39 @@ This can be useful for collecting statistics, or if one wants to define a descri
 
 The return value for unlimited descriptors will be an OK status code with the LimitRemaining field set to MaxUint32 value.
 
+ ### Example 6
+
+ A rule using shadow_mode is useful for soft-launching rate limiting. In this example
+
+```
+RateLimitRequest:
+  domain: example6
+  descriptor: ("service", "auth-service"),("user", "user-a")
+```
+
+`user-a` of the `auth-service` would not get rate-limited regardless of the rate of requests, there would however be statistics related to the breach of the configured limit of 10 req / sec.
+
+`user-b` would be limited to 20 req / sec however.
+
+```yaml
+domain: example6
+descriptors:
+  - key: service
+    descriptors:
+      - key: user
+        value: user-a
+        rate_limit:
+          requests_per_unit: 10
+          unit: second
+        shadow_mode: true
+      - key: user
+        value: user-b
+        rate_limit:
+          requests_per_unit: 20
+          unit: second
+```
+
+
 ## Loading Configuration
 
 The Ratelimit service uses a library written by Lyft called [goruntime](https://github.com/lyft/goruntime) to do configuration loading. Goruntime monitors
@@ -431,6 +500,19 @@ go run main.go -domain test \
 -descriptors name=foo,age=14 -descriptors name=bar,age=18
 ```
 
+# Global ShadowMode
+
+There is a global shadow-mode which can make it easier to introduce rate limiting into an existing service landscape. It will override whatever result is returned by the regular rate limiting process.
+
+## Configuration
+The global shadow mode is configured with an environment variable
+
+Setting environment variable`SHADOW_MODE` to `true` will enable the feature.
+
+## Statistics
+There is an additional service-level statistics generated that will increment whenever the global shadow mode has overridden a rate limiting result.
+
+
 # Statistics
 
 The rate limit service generates various statistics for each configured rate limit rule that will be useful for end
@@ -454,6 +536,7 @@ STAT:
 * near_limit: Number of rule hits over the NearLimit ratio threshold (currently 80%) but under the threshold rate.
 * over_limit: Number of rule hits exceeding the threshold rate
 * total_hits: Number of rule hits in total
+* shadow_mode: Number of rule hits where shadow_mode would trigger and override the over_limit result
 
 To use a custom near_limit ratio threshold, you can specify with `NEAR_LIMIT_RATIO` environment variable. It defaults to `0.8` (0-1 scale). These are examples of generated stats for some configured rate limit rules from the above examples:
 
@@ -464,6 +547,9 @@ ratelimit.service.rate_limit.mongo_cps.database_users.over_limit: 0
 ratelimit.service.rate_limit.mongo_cps.database_users.total_hits: 2939
 ratelimit.service.rate_limit.messaging.message_type_marketing.to_number.over_limit: 0
 ratelimit.service.rate_limit.messaging.message_type_marketing.to_number.total_hits: 0
+ratelimit.service.rate_limit.messaging.auth-service.over_limit.total_hits: 1
+ratelimit.service.rate_limit.messaging.auth-service.over_limit.over_limit: 1
+ratelimit.service.rate_limit.messaging.auth-service.over_limit.shadow_mode: 1
 ```
 
 ## Statistics options

diff --git a/examples/envoy/proxy.yaml b/examples/envoy/proxy.yaml
@@ -55,6 +55,7 @@ static_resources:
                       stage: 0
                       rate_limited_as_resource_exhausted: true
                       failure_mode_deny: false
+                      enable_x_ratelimit_headers: DRAFT_VERSION_03
                       rate_limit_service:
                         grpc_service:
                           envoy_grpc:

diff --git a/examples/prom-statsd-exporter/conf.yaml b/examples/prom-statsd-exporter/conf.yaml
@@ -86,6 +86,16 @@ mappings: # Requires statsd exporter >= v0.6.0 since it uses the "drop" action.
     name: "ratelimit_service_config_load_error"
     match_metric_type: counter
 
+  - match:
+      "ratelimit.service.rate_limit.*.*.*.shadow_mode"
+    name: "ratelimit_service_rate_limit_shadow_mode"
+    timer_type: "histogram"
+    labels:
+      domain: "$1"
+      key1: "$2"
+      key2: "$3"
+
+
   # Enable below in production once you have the metrics you need
   # - match: "."
   #   match_type: "regex"

diff --git a/examples/ratelimit/config/example.yaml b/examples/ratelimit/config/example.yaml
@@ -27,6 +27,17 @@ descriptors:
         rate_limit:
           unit: second
           requests_per_unit: 1
+      - key: baz
+        value: not-so-shady
+        rate_limit:
+          unit: minute
+          requests_per_unit: 3
+      - key: baz
+        value: shady
+        rate_limit:
+          unit: minute
+          requests_per_unit: 3
+        shadow_mode: true
       - key: bay
         rate_limit:
           unlimited: true

diff --git a/integration-test/Dockerfile.tester b/integration-test/Dockerfile.tester
@@ -0,0 +1,7 @@
+FROM alpine:latest
+
+USER root
+
+RUN apk update && apk upgrade && apk add bash curl sed grep
+
+ENTRYPOINT [ "bash" ]
diff --git a/integration-test/docker-compose-integration-test.yml b/integration-test/docker-compose-integration-test.yml
@@ -0,0 +1,111 @@
+version: "3"
+services:
+  redis:
+    image: redis:alpine
+    expose:
+      - 6379
+    ports:
+      - 6379:6379
+    networks:
+      - ratelimit-network
+
+  statsd:
+    image: prom/statsd-exporter:v0.18.0
+    entrypoint: /bin/statsd_exporter
+    command:
+      - "--statsd.mapping-config=/etc/statsd-exporter/conf.yaml"
+    expose:
+      - 9125
+      - 9102
+    ports:
+      - 9125:9125
+      - 9102:9102 # Visit http://localhost:9102/metrics to see metrics in Prometheus format
+    networks:
+      - ratelimit-network
+    volumes:
+      - ./examples/prom-statsd-exporter/conf.yaml:/etc/statsd-exporter/conf.yaml
+
+  ratelimit:
+    build:
+      context: ${PWD}
+      dockerfile: Dockerfile
+    command: /bin/ratelimit
+    ports:
+      - 8080:8080
+      - 8081:8081
+      - 6070:6070
+    depends_on:
+      - redis
+      - statsd
+    networks:
+      - ratelimit-network
+    volumes:
+      - ./examples/ratelimit/config:/data/ratelimit/config
+    environment:
+      - USE_STATSD=true
+      - STATSD_HOST=statsd
+      - STATSD_PORT=9125
+      - LOG_LEVEL=debug
+      - REDIS_SOCKET_TYPE=tcp
+      - REDIS_URL=redis:6379
+      - RUNTIME_ROOT=/data
+      - RUNTIME_SUBDIRECTORY=ratelimit
+      - RUNTIME_WATCH_ROOT=false
+
+  envoy-proxy:
+    image: envoyproxy/envoy-dev:latest
+    entrypoint: "/usr/local/bin/envoy"
+    command:
+      - "--service-node proxy"
+      - "--service-cluster proxy"
+      - "--config-path /etc/envoy/envoy.yaml"
+      - "--concurrency 1"
+      - "--mode serve"
+      - "--log-level info"
+    depends_on:
+      - ratelimit
+    volumes:
+      - ./examples/envoy/proxy.yaml:/etc/envoy/envoy.yaml
+    networks:
+      - ratelimit-network
+    expose:
+       - "8888"
+       - "8001"
+    ports:
+       - "8888:8888"
+       - "8001:8001"
+
+  envoy-mock:
+    image: envoyproxy/envoy-dev:latest
+    entrypoint: "/usr/local/bin/envoy"
+    command:
+      - "--service-node mock"
+      - "--service-cluster mock"
+      - "--config-path /etc/envoy/envoy.yaml"
+      - "--concurrency 1"
+      - "--mode serve"
+      - "--log-level info"
+    volumes:
+      - ./examples/envoy/mock.yaml:/etc/envoy/envoy.yaml
+    networks:
+      - ratelimit-network
+    expose:
+       - "9999"
+    ports:
+       - "9999:9999"
+
+  tester:
+    build:
+      context: ${PWD}
+      dockerfile: integration-test/Dockerfile.tester
+    depends_on:
+      - envoy-proxy
+      - envoy-mock
+    command: /test/run-all.sh
+    volumes:
+      - ${PWD}/integration-test/:/test/
+    networks:
+      - ratelimit-network
+
+networks:
+  ratelimit-network:
diff --git a/integration-test/run-all.sh b/integration-test/run-all.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+echo "Running tests"
+
+FILES=/test/scripts/*
+for f in $FILES
+do
+  echo "Processing $f file..."
+  # take action on each file. $f store current file name
+  $f
+  if [ $? -ne 0 ] ; then
+    echo "Failed file $f"
+    exit 1
+  fi
+done
diff --git a/integration-test/scripts/simple-get.sh b/integration-test/scripts/simple-get.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+# Just happy path
+curl -s -f -H "foo: test" -H "baz: shady" http://envoy-proxy:8888/twoheader
+
+if [ $? -ne 0 ] ; then
+    exit 1
+fi
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,6 +2,7 @@ cover.out @@
     bin/
     .idea/
+    .vscode/
     vendor
     cert.pem
     key.pem
@@ Expand Down @@