http ratelimit: option to reduce budget on stream done #37548
Envoy/Publish and verify (success)
Check has finished
Details
Check run finished (success ✔️)
The check run can be viewed here:
Envoy/Publish and verify (pr/37548/main@b76e3d7)
Check started by
Request (pr/37548/main@b76e3d7)
@mathetake b76e3d7
#37548 merge
main@9d14e5e
http ratelimit: option to reduce budget on stream done
Commit Message: ratelimit: option to excute action on stream done
Additional Description:
This adds a new optionapply_on_stream_done
to the rate limit
policy corresponding to each descriptor. This basically allows to configure
descriptors to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled viaenvoy.ratelimit.hits_addend
metadata,
another filter can be used to set the value to reflect their intent there,
for example, by using Lua or Ext Proc filters.This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".Risk Level: low
Testing: done
Docs Changes: done
Release Notes: TODO
Platform Specific Features: n/a
Environment
Request variables
Key | Value |
---|---|
ref | 5aa40f0 |
sha | b76e3d7 |
pr | 37548 |
base-sha | 9d14e5e |
actor | @mathetake |
message | http ratelimit: option to reduce budget on stream done ... |
started | 1734519597.120959 |
target-branch | main |
trusted | false |
Build image
Container image/s (as used in this CI run)
Key | Value |
---|---|
default | envoyproxy/envoy-build-ubuntu:d2be0c198feda0c607fa33209da01bf737ef373f |
mobile | envoyproxy/envoy-build-ubuntu:mobile-d2be0c198feda0c607fa33209da01bf737ef373f |
Version
Envoy version (as used in this CI run)
Key | Value |
---|---|
major | 1 |
minor | 33 |
patch | 0 |
dev | true |