Load shedding #40142

Ladicek · 2024-04-18T15:06:04Z

Related to #36543

Ladicek · 2024-04-18T15:07:39Z

Draft because this is very much work in progress. Sharing to maybe get some initial feedback.

If this is considered to be too niche for core Quarkus, I'd be fine with moving to Quarkiverse.

cescoffier

Looks great!

First, I like the overload algorithm you are using. Any reference on it?
Then, we have a connection limiter in Quarkus already, should we deprecate this in favor of the load shedding?

Also, I think it needs to be tested with long-running connection (gRPC streams, SSE, web sockets).

I'm also thinking that on overload, we may need to adjust the readiness Kube probe. WDYT?

Ladicek · 2024-04-22T07:57:24Z

The algorithm is a simplified version of https://github.com/Netflix/concurrency-limits/blob/master/concurrency-limits-core/src/main/java/com/netflix/concurrency/limits/limit/VegasLimit.java, I think there's a link in the javadoc already.

Then, we have a connection limiter in Quarkus already, should we deprecate this in favor of the load shedding?

Ah do we? I had no idea. Where can I learn more?

I think it needs to be tested with long-running connection (gRPC streams, SSE, web sockets).

Good point, I didn't try that at all. I'll check, but I doubt I'll see anything meaningful. The present implementation is heavily oriented to request/response style of interaction. Occasional stream likely won't do anything, and a streaming-heavy application would require a more involved implementation.

I'm also thinking that on overload, we may need to adjust the readiness Kube probe. WDYT?

Hmm, I'm not sure about that. That would be super coarse-grained.

Ladicek · 2024-05-21T14:21:40Z

Just marked as ready for review. I fixed a couple of bugs in the implementation and added some rudimentary documentation.

The feature is not very heavily tested. @franz1981 do you think it would be possible to test this in a perf lab? I don't really know how that works, but I guess we have a test that stresses Quarkus application above its capacity, where this would help?

github-actions · 2024-05-21T15:22:43Z

🙈 The PR is closed and the preview is expired.

Ladicek · 2024-06-03T11:18:56Z

Rebased and fixed the conflict.

Ladicek · 2024-06-05T13:13:55Z

Rebased and fixed a few tiny issues. I believe this is ready now.

cescoffier

I think we should merge it and iterate.

gsmet

I added two small comments. I don't mind you merging and addressing them later so that we avoid some more conflicts.

extensions/load-shedding/runtime/pom.xml

extensions/load-shedding/runtime/src/main/resources/META-INF/quarkus-extension.yaml

The overload detector uses a TCP Vegas based algorithm, as implemented by Netflix Concurrency Limiters. Priority load shedding uses 5 priority levels and 128 cohorts. A simple cubic function is used to determine the threshold that current CPU load has to reach to reject the current request.

quarkus-bot · 2024-06-06T07:27:41Z

Status for workflow `Quarkus Documentation CI`

This is the status report for running Quarkus Documentation CI on commit 3b9c6ac.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

⚠️ There are other workflow runs running, you probably need to wait for their status before merging.

quarkus-bot · 2024-06-06T12:47:53Z

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 3b9c6ac.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

Ladicek requested a review from cescoffier April 18, 2024 15:06

quarkus-bot bot added area/dependencies Pull requests that update a dependency file area/vertx labels Apr 18, 2024