From a33a307425d8b106ee09a105d2958c585a5c260f Mon Sep 17 00:00:00 2001 From: "update-envoy[bot]" <135279899+update-envoy[bot]@users.noreply.github.com> Date: Thu, 19 Dec 2024 02:49:18 +0000 Subject: [PATCH] http ratelimit: option to reduce budget on stream done (#37548) Commit Message: ratelimit: option to excute action on stream done Additional Description: This adds a new option `apply_on_stream_done` to the rate limit policy corresponding to each descriptor. This basically allows to configure descriptors to be executed in a response content-aware way and do not enforce the rate limit (in other words "fire-and-forget"). Since addend can be currently controlled via metadata per descriptor, another filter can be used to set the value to reflect their intent there, for example, by using Lua or Ext Proc filters. This use case arises from the LLM API services which usually return the usage statistics in the response body. More specifically, they have "streaming" APIs whose response is a line-by-line event stream where the very last line of the response line contains the usage statistics. The lazy nature of this action is perfectly fine as in these use cases, the rate limit happens like "you are forbidden from the next time". Besides the LLM specific, I've also encountered the use case from the data center resource allocation case where the operators want to "block the computation from the next time since you used this much resources in this request". Ref: https://github.com/envoyproxy/gateway/issues/4756 Risk Level: low Testing: done Docs Changes: done Release Notes: TODO Platform Specific Features: n/a --------- Signed-off-by: Takeshi Yoneda Mirrored from https://github.com/envoyproxy/envoy @ 857107b72abdf62690b7a1c69f9a3684d57f5f3e --- envoy/config/route/v3/route_components.proto | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/envoy/config/route/v3/route_components.proto b/envoy/config/route/v3/route_components.proto index 909a7305f..a3d4d009f 100644 --- a/envoy/config/route/v3/route_components.proto +++ b/envoy/config/route/v3/route_components.proto @@ -1868,7 +1868,7 @@ message VirtualCluster { // Global rate limiting :ref:`architecture overview `. // Also applies to Local rate limiting :ref:`using descriptors `. -// [#next-free-field: 6] +// [#next-free-field: 7] message RateLimit { option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.route.RateLimit"; @@ -2245,6 +2245,23 @@ message RateLimit { // :ref:`VirtualHost.typed_per_filter_config` or // :ref:`Route.typed_per_filter_config`, etc. HitsAddend hits_addend = 5; + + // If true, the rate limit request will be applied when the stream completes. The default value is false. + // This is useful when the rate limit budget needs to reflect the response context that is not available + // on the request path. + // + // For example, let's say the upstream service calculates the usage statistics and returns them in the response body + // and we want to utilize these numbers to apply the rate limit action for the subsequent requests. + // Combined with another filter that can set the desired addend based on the response (e.g. Lua filter), + // this can be used to subtract the usage statistics from the rate limit budget. + // + // A rate limit applied on the stream completion is "fire-and-forget" by nature, and rate limit is not enforced by this config. + // In other words, the current request won't be blocked when this is true, but the budget will be updated for the subsequent + // requests based on the action with this field set to true. Users should ensure that the rate limit is enforced by the actions + // applied on the request path, i.e. the ones with this field set to false. + // + // Currently, this is only supported by the HTTP global rate filter. + bool apply_on_stream_done = 6; } // .. attention::