Load Shedding: TCP Vegas, request priority and request cohort

The overload detector uses a TCP Vegas based algorithm, as implemented by Netflix Concurrency Limiters. Priority load shedding uses 5 priority levels and 128 cohorts. A simple cubic function is used to determine whether current CPU load has reached the threshold for current request.
quarkusio · May 21, 2024 · c48e093 · c48e093
1 parent 589cf47
commit c48e093
Show file tree

Hide file tree

Showing 19 changed files with 884 additions and 1 deletion.
diff --git a/bom/application/pom.xml b/bom/application/pom.xml
@@ -1886,6 +1886,16 @@
                 <artifactId>quarkus-smallrye-openapi-common-deployment</artifactId>
                 <version>${project.version}</version>
             </dependency>
+            <dependency>
+                <groupId>io.quarkus</groupId>
+                <artifactId>quarkus-load-shedding</artifactId>
+                <version>${project.version}</version>
+            </dependency>
+            <dependency>
+                <groupId>io.quarkus</groupId>
+                <artifactId>quarkus-load-shedding-deployment</artifactId>
+                <version>${project.version}</version>
+            </dependency>
             <dependency>
                 <groupId>io.quarkus</groupId>
                 <artifactId>quarkus-vertx</artifactId>

diff --git a/docs/src/main/asciidoc/load-shedding-reference.adoc b/docs/src/main/asciidoc/load-shedding-reference.adoc
@@ -0,0 +1,148 @@
+////
+This guide is maintained in the main Quarkus repository
+and pull requests should be submitted there:
+https://github.com/quarkusio/quarkus/tree/main/docs/src/main/asciidoc
+////
+= Load Shedding reference guide
+include::_attributes.adoc[]
+:numbered:
+:sectnums:
+:categories: web
+:topics: web,load-shedding
+:extensions: io.quarkus:quarkus-load-shedding
+:extension-status: experimental
+
+include::{includes}/extension-status.adoc[]
+
+Load shedding is the practice of detecting service overload and rejecting requests.
+
+In Quarkus, the `quarkus-load-shedding` extension provides a load shedding mechanism.
+
+== Use the Load Shedding extension
+
+To use the load shedding extension, you need to add the `io.quarkus.quarkus-load-shedding` extension to your project:
+
+[source,xml,role="primary asciidoc-tabs-target-sync-cli asciidoc-tabs-target-sync-maven"]
+.pom.xml
+----
+<dependency>
+    <groupId>io.quarkus</groupId>
+    <artifactId>quarkus-load-shedding</artifactId>
+</dependency>
+----
+
+[source,gradle,role="secondary asciidoc-tabs-target-sync-gradle"]
+.build.gradle
+----
+implementation("io.quarkus:quarkus-load-shedding")
+----
+
+No configuration is required, though the possible configuration options are described below.
+
+== The load shedding algorithm
+
+The load shedding algorithm has 2 parts:
+
+* overload detection
+* priority load shedding (optional)
+
+=== Overload detection
+
+To detect whether the current service is overloaded, an adaptation of TCP Vegas is used.
+
+The algorithm starts with 100 allowed concurrent requests.
+For each request, it compares the number of current requests with the allowed limit and if the limit is exceeded, an overload situation is signalled.
+
+If the limit is not exceeded, or if priority load shedding determines that the request should not be rejected (see below), the request is allowed.
+When it finishes, its duration is compared with the lowest duration seen so far to estimate a queue size.
+If the queue size is lower than _alpha_, the current limit is increased, but only up to a given maximum, by default 1000.
+If the queue size is greater than _beta_, the current limit is decreased.
+Otherwise, the current limit is kept intact.
+
+Alpha and beta are computed by multiplying the configurable constants with a base 10 logarithm of the current limit.
+
+After some number of requests, which can be modified by configuring the _probe_ factor, the lowest duration seen is reset to the last seen duration of a request.
+
+=== Priority load shedding
+
+If an overload situation is signalled, priority load shedding is invoked.
+
+By default, priority load shedding is enabled, which means a request is only rejected if the current CPU load is high enough.
+To determine whether a request should be rejected, 2 attributes are considered:
+
+* request priority
+* request cohort
+
+There are 5 statically defined priorities and 128 cohorts, which amounts to 640 request groups in total.
+
+After both priority and cohort are assigned to a request, a request group number is computed: `group = priority * max_cohorts + cohort`.
+Then, the group number is compared to a simple cubic function of current CPU load, where `load` is a number between 0 and 1: `max_groups * (1 - load^3)`.
+If the group number is higher, the request is rejected, otherwise it is allowed even in an overload situation.
+
+If priority load shedding is disabled, all requests are rejected in an overload situation.
+
+==== Customizing request priority
+
+Priority is assigned by a `io.quarkus.load.shedding.RequestPrioritizer`.
+There is 5 statically defined priorities in the `io.quarkus.load.shedding.RequestPriority` enum: `CRITICAL`, `IMPORTANT`, `NORMAL`, `BACKGROUND` and `DEGRADED`.
+By default, if no request prioritizer applies, the priority is assumed to be `NORMAL`.
+
+There is one default prioritizer which assigns the priority of `CRITICAL` to requests to the non-application endpoints.
+It declares no `@Priority`.
+
+It is possible to define custom implementations of the `RequestPrioritizer` interface.
+The implementations must be CDI beans, otherwise they are ignored.
+The CDI rules of typesafe resolution must be followed.
+That is, if multiple implementations exist with a different `@Priority` value, only the implementations with the highest priority are retained.
+
+==== Customizing request cohort
+
+Cohort is assigned by a `io.quarkus.load.shedding.RequestClassifier`.
+There is 128 statically defined cohorts, with the lowest number being 1 and highest number being 128.
+The classifier should return a number in this interval; if it does not, the number is adjusted automatically.
+
+There is one default classifier which assigns a cohort based on a hash of the remote IP address and current time, such that an IP address changes its cohort roughly every 2.5 hours.
+It declares no `@Priority`.
+
+It is possible to define custom implementations of the `RequestClassifier` interface.
+The implementations must be CDI beans, otherwise they are ignored.
+The CDI rules of typesafe resolution must be followed.
+That is, if multiple implementations exist with a different `@Priority` value, only the implementations with the highest priority are retained.
+
+== Limitations
+
+The load shedding extension currently only applies to HTTP requests, and is heavily skewed towards request/response network interactions.
+This means that gRPC, WebSocket and other kinds of streaming over HTTP are not supported.
+Other "entrypoints" to Quarkus applications, such as messaging, are not supported either.
+
+Further, the load shedding implementation is currently rather basic and not heavily tested in production.
+Improvements may be necessary.
+
+== Configuration reference
+
+include::{generated-dir}/config/quarkus-load-shedding.adoc[opts=optional, leveloffset=+1]
+
+== Further reading
+
+Netflix Technology Blog:
+
+* https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581[Performance Under Load]
+* https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94[Keeping Netflix Reliable Using Prioritized Load Shedding]
+
+Uber Engineering Blog:
+
+* https://www.uber.com/blog/cinnamon-using-century-old-tech-to-build-a-mean-load-shedder/[Cinnamon: Using Century Old Tech to Build a Mean Load Shedder]
+* https://www.uber.com/blog/pid-controller-for-cinnamon/[PID Controller for Cinnamon]
+* https://www.uber.com/blog/cinnamon-auto-tuner-adaptive-concurrency-in-the-wild/[Cinnamon Auto-Tuner: Adaptive Concurrency in the Wild]
+
+Amazon Builders' Library:
+
+* https://aws.amazon.com/builders-library/using-load-shedding-to-avoid-overload/[Using load shedding to avoid overload]
+
+Google Cloud Blog:
+
+* https://cloud.google.com/blog/products/gcp/using-load-shedding-to-survive-a-success-disaster-cre-life-lessons[Using load shedding to survive a success disaster]
+
+CodeReliant Blog:
+
+* https://www.codereliant.io/load-shedding/[Load Shedding for High Traffic Systems]
diff --git a/extensions/load-shedding/deployment/pom.xml b/extensions/load-shedding/deployment/pom.xml
@@ -0,0 +1,68 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+
+    <parent>
+        <groupId>io.quarkus</groupId>
+        <artifactId>quarkus-load-shedding-parent</artifactId>
+        <version>999-SNAPSHOT</version>
+    </parent>
+
+    <artifactId>quarkus-load-shedding-deployment</artifactId>
+
+    <name>Quarkus - Load Shedding - Deployment</name>
+
+    <dependencies>
+        <dependency>
+            <groupId>io.quarkus</groupId>
+            <artifactId>quarkus-load-shedding</artifactId>
+        </dependency>
+
+        <dependency>
+            <groupId>io.quarkus</groupId>
+            <artifactId>quarkus-vertx-http-deployment</artifactId>
+        </dependency>
+
+        <dependency>
+            <groupId>io.quarkus</groupId>
+            <artifactId>quarkus-rest-deployment</artifactId>
+            <scope>test</scope>
+        </dependency>
+
+        <dependency>
+            <groupId>io.quarkus</groupId>
+            <artifactId>quarkus-junit5-internal</artifactId>
+            <scope>test</scope>
+        </dependency>
+
+        <dependency>
+            <groupId>io.rest-assured</groupId>
+            <artifactId>rest-assured</artifactId>
+            <scope>test</scope>
+        </dependency>
+
+        <dependency>
+            <groupId>org.assertj</groupId>
+            <artifactId>assertj-core</artifactId>
+        </dependency>
+    </dependencies>
+
+    <build>
+        <plugins>
+            <plugin>
+                <artifactId>maven-compiler-plugin</artifactId>
+                <configuration>
+                    <annotationProcessorPaths>
+                        <path>
+                            <groupId>io.quarkus</groupId>
+                            <artifactId>quarkus-extension-processor</artifactId>
+                            <version>${project.version}</version>
+                        </path>
+                    </annotationProcessorPaths>
+                </configuration>
+            </plugin>
+        </plugins>
+    </build>
+</project>
diff --git a/...g/deployment/src/main/java/io/quarkus/load/shedding/deployment/LoadSheddingProcessor.java b/...g/deployment/src/main/java/io/quarkus/load/shedding/deployment/LoadSheddingProcessor.java
@@ -0,0 +1,34 @@
+package io.quarkus.load.shedding.deployment;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.quarkus.arc.deployment.AdditionalBeanBuildItem;
+import io.quarkus.deployment.annotations.BuildStep;
+import io.quarkus.deployment.builditem.FeatureBuildItem;
+import io.quarkus.load.shedding.runtime.HttpLoadShedding;
+import io.quarkus.load.shedding.runtime.HttpRequestClassifier;
+import io.quarkus.load.shedding.runtime.ManagementRequestPrioritizer;
+import io.quarkus.load.shedding.runtime.OverloadDetector;
+import io.quarkus.load.shedding.runtime.PriorityLoadShedding;
+
+public class LoadSheddingProcessor {
+    private static final String FEATURE = "load-shedding";
+
+    @BuildStep
+    FeatureBuildItem feature() {
+        return new FeatureBuildItem(FEATURE);
+    }
+
+    @BuildStep
+    AdditionalBeanBuildItem beans() {
+        List<String> beans = new ArrayList<>();
+        beans.add(OverloadDetector.class.getName());
+        beans.add(HttpLoadShedding.class.getName());
+        beans.add(PriorityLoadShedding.class.getName());
+        beans.add(ManagementRequestPrioritizer.class.getName());
+        beans.add(HttpRequestClassifier.class.getName());
+
+        return AdditionalBeanBuildItem.builder().addBeanClasses(beans).build();
+    }
+}
diff --git a/...oad-shedding/deployment/src/test/java/io/quarkus/load/shedding/NaiveLoadSheddingTest.java b/...oad-shedding/deployment/src/test/java/io/quarkus/load/shedding/NaiveLoadSheddingTest.java
@@ -0,0 +1,64 @@
+package io.quarkus.load.shedding;
+
+import static io.restassured.RestAssured.when;
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import jakarta.ws.rs.GET;
+import jakarta.ws.rs.Path;
+
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.RegisterExtension;
+
+import io.quarkus.test.QuarkusUnitTest;
+
+public class NaiveLoadSheddingTest {
+    private static final int NUM_THREADS = 20;
+    private static final int NUM_REQUESTS = 10;
+
+    @RegisterExtension
+    static final QuarkusUnitTest config = new QuarkusUnitTest()
+            .withApplicationRoot(jar -> jar.addClasses(MyResource.class))
+            .overrideConfigKey("quarkus.load-shedding.initial-limit", "5")
+            .overrideConfigKey("quarkus.load-shedding.max-limit", "10")
+            .overrideConfigKey("quarkus.load-shedding.priority.enabled", "false");
+
+    @Test
+    public void test() throws InterruptedException {
+        AtomicInteger numErrors = new AtomicInteger();
+        CountDownLatch begin = new CountDownLatch(1);
+        CountDownLatch end = new CountDownLatch(NUM_THREADS);
+        for (int i = 0; i < NUM_THREADS; i++) {
+            new Thread(() -> {
+                try {
+                    begin.await();
+                    for (int j = 0; j < NUM_REQUESTS; j++) {
+                        int statusCode = when().get("/").then().extract().statusCode();
+                        if (statusCode == 503) {
+                            numErrors.incrementAndGet();
+                        }
+                    }
+                    end.countDown();
+                } catch (InterruptedException e) {
+                    throw new RuntimeException(e);
+                }
+            }).start();
+        }
+
+        begin.countDown();
+        end.await();
+
+        assertThat(numErrors).hasValueGreaterThanOrEqualTo(100);
+    }
+
+    @Path("/")
+    public static class MyResource {
+        @GET
+        public String hello() throws InterruptedException {
+            Thread.sleep(100);
+            return "Hello, world!";
+        }
+    }
+}
diff --git a/extensions/load-shedding/pom.xml b/extensions/load-shedding/pom.xml
@@ -0,0 +1,24 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+
+    <parent>
+        <artifactId>quarkus-extensions-parent</artifactId>
+        <groupId>io.quarkus</groupId>
+        <version>999-SNAPSHOT</version>
+        <relativePath>../pom.xml</relativePath>
+    </parent>
+
+    <artifactId>quarkus-load-shedding-parent</artifactId>
+    <packaging>pom</packaging>
+
+    <name>Quarkus - Load Shedding</name>
+
+    <modules>
+        <module>deployment</module>
+        <module>runtime</module>
+    </modules>
+
+</project>