http ratelimit: option to reduce budget on stream done #37548

Envoy/Publish and verify (success)

Check has finished

@mathetake

Check run finished (success ✔️)

The check run can be viewed here:

Envoy/Publish and verify (pr/37548/main@b76e3d7)

Check started by

Request (pr/37548/main@b76e3d7)

@mathetake b76e3d7 #37548 merge main@9d14e5e

http ratelimit: option to reduce budget on stream done

Commit Message: ratelimit: option to excute action on stream done

Additional Description:
This adds a new option apply_on_stream_done to the rate limit
policy corresponding to each descriptor. This basically allows to configure
descriptors to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled via envoy.ratelimit.hits_addend metadata,
another filter can be used to set the value to reflect their intent there,
for example, by using Lua or Ext Proc filters.

This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".

Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".

Ref: envoyproxy/gateway#4756

Risk Level: low
Testing: done
Docs Changes: done
Release Notes: TODO
Platform Specific Features: n/a

Environment

Request variables

Key	Value
ref	`5aa40f0`
sha	`b76e3d7`
pr	`37548`
base-sha	`9d14e5e`
actor	@mathetake
message	http ratelimit: option to reduce budget on stream done ...
started	`1734519597.120959`
target-branch	main
trusted	`false`

Build image

Container image/s (as used in this CI run)

Key	Value
default	envoyproxy/envoy-build-ubuntu:d2be0c198feda0c607fa33209da01bf737ef373f
mobile	envoyproxy/envoy-build-ubuntu:mobile-d2be0c198feda0c607fa33209da01bf737ef373f

Version

Envoy version (as used in this CI run)

Key	Value
major	`1`
minor	`33`
patch	`0`
dev	`true`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http ratelimit: option to reduce budget on stream done #37548

http ratelimit: option to reduce budget on stream done #37548

Envoy/Publish and verify (success)

Details

Check run finished (success ✔️)

The check run can be viewed here:

Envoy/Publish and verify (pr/37548/main@b76e3d7)

Check started by

Request (pr/37548/main@b76e3d7)

Request variables

Container image/s (as used in this CI run)

Envoy version (as used in this CI run)

Re-running checks...

http ratelimit: option to reduce budget on stream done #37548

apply review commments

http ratelimit: option to reduce budget on stream done #37548

Envoy/Publish and verify (success)

Details

Check run finished (success ✔️)

The check run can be viewed here:

Envoy/Publish and verify (pr/37548/main@b76e3d7)

Check started by

Request (pr/37548/main@b76e3d7)

Request variables

Container image/s (as used in this CI run)

Envoy version (as used in this CI run)

Re-running checks...