From a33a307425d8b106ee09a105d2958c585a5c260f Mon Sep 17 00:00:00 2001
From: "update-envoy[bot]"
 <135279899+update-envoy[bot]@users.noreply.github.com>
Date: Thu, 19 Dec 2024 02:49:18 +0000
Subject: [PATCH] http ratelimit: option to reduce budget on stream done 
 (#37548)

Commit Message: ratelimit: option to excute action on stream done

Additional Description:
This adds a new option `apply_on_stream_done` to the rate limit
policy corresponding to each descriptor. This basically allows to
configure
descriptors to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled via metadata per descriptor,
another filter can be used to set the value to reflect their intent
there,
for example, by using  Lua or Ext Proc filters.

This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".

Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".

Ref: https://github.com/envoyproxy/gateway/issues/4756

Risk Level: low
Testing: done
Docs Changes: done
Release Notes: TODO
Platform Specific Features: n/a

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>

Mirrored from https://github.com/envoyproxy/envoy @ 857107b72abdf62690b7a1c69f9a3684d57f5f3e
---
 envoy/config/route/v3/route_components.proto | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/envoy/config/route/v3/route_components.proto b/envoy/config/route/v3/route_components.proto
index 909a7305f..a3d4d009f 100644
--- a/envoy/config/route/v3/route_components.proto
+++ b/envoy/config/route/v3/route_components.proto
@@ -1868,7 +1868,7 @@ message VirtualCluster {
 
 // Global rate limiting :ref:`architecture overview <arch_overview_global_rate_limit>`.
 // Also applies to Local rate limiting :ref:`using descriptors <config_http_filters_local_rate_limit_descriptors>`.
-// [#next-free-field: 6]
+// [#next-free-field: 7]
 message RateLimit {
   option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.route.RateLimit";
 
@@ -2245,6 +2245,23 @@ message RateLimit {
   //   :ref:`VirtualHost.typed_per_filter_config<envoy_v3_api_field_config.route.v3.VirtualHost.typed_per_filter_config>` or
   //   :ref:`Route.typed_per_filter_config<envoy_v3_api_field_config.route.v3.Route.typed_per_filter_config>`, etc.
   HitsAddend hits_addend = 5;
+
+  // If true, the rate limit request will be applied when the stream completes. The default value is false.
+  // This is useful when the rate limit budget needs to reflect the response context that is not available
+  // on the request path.
+  //
+  // For example, let's say the upstream service calculates the usage statistics and returns them in the response body
+  // and we want to utilize these numbers to apply the rate limit action for the subsequent requests.
+  // Combined with another filter that can set the desired addend based on the response (e.g. Lua filter),
+  // this can be used to subtract the usage statistics from the rate limit budget.
+  //
+  // A rate limit applied on the stream completion is "fire-and-forget" by nature, and rate limit is not enforced by this config.
+  // In other words, the current request won't be blocked when this is true, but the budget will be updated for the subsequent
+  // requests based on the action with this field set to true. Users should ensure that the rate limit is enforced by the actions
+  // applied on the request path, i.e. the ones with this field set to false.
+  //
+  // Currently, this is only supported by the HTTP global rate filter.
+  bool apply_on_stream_done = 6;
 }
 
 // .. attention::