Request Limiter #25093

mpalmi · 2024-01-26T16:30:12Z

This commit introduces two new adaptive concurrency limiters in Vault, which should handle overloading of the server during periods of untenable request rate. The limiter adjusts the number of allowable in-flight requests based on latency measurements performed across the request duration. This approach allows us to reject entire requests prior to doing any work and prevents clients from exceeding server capacity.

The limiters intentionally target two separate vectors that have been proven to lead to server over-utilization.

Back pressure from the storage backend, resulting in bufferbloat in the WAL system. (enterprise)
Back pressure from CPU over-utilization via PKI issue requests (specifically for RSA keys), resulting in failed heartbeats.

Storage constraints can be accounted for by limiting logical requests according to their http.Method. We only limit requests with write-based methods, since these will result in storage Puts and exhibit the aforementioned bufferbloat.

CPU constraints are accounted for using the same underlying library and technique; however, they require special treatment. The maximum number of concurrent pki/issue requests found in testing (again, specifically for RSA keys) is far lower than the minimum tolerable write request rate. Without separate limiting, we would artificially impose limits on tolerable request rates for non-PKI requests. To specifically target PKI issue requests, we add a new PathsSpecial field, called limited, allowing backends to specify a list of paths which should get special-case request limiting.

For the sake of code cleanliness and future extensibility, we introduce the concept of a LimiterRegistry. The registry proposed in this PR has two entries, corresponding with the two vectors above. Each Limiter entry has its own corresponding maximum and minimum concurrency, allowing them to react to latency deviation independently and handle high volumes of requests to targeted bottlenecks (CPU and storage).

In both cases, utilization will be effectively throttled before Vault reaches any degraded state. The resulting 503 - Service Unavailable is a retryable HTTP response code, which can be handled to gracefully retry and eventually succeed. Clients should handle this by retrying with jitter and exponential backoff. This is done within Vault's API, using the go-retryablehttp library.

Limiter testing was performed via benchmarks of mixed workloads and across a deployment of agent pods with great success.

This commit introduces two new adaptive concurrency limiters in Vault, which should handle overloading of the server during periods of untenable request rate. The limiter adjusts the number of allowable in-flight requests based on latency measurements performed across the request duration. This approach allows us to reject entire requests prior to doing any work and prevents clients from exceeding server capacity. The limiters intentionally target two separate vectors that have been proven to lead to server over-utilization. - Back pressure from the storage backend, resulting in bufferbloat in the WAL system. (enterprise) - Back pressure from CPU over-utilization via PKI issue requests (specifically for RSA keys), resulting in failed heartbeats. Storage constraints can be accounted for by limiting logical requests according to their http.Method. We only limit requests with write-based methods, since these will result in storage Puts and exhibit the aforementioned bufferbloat. CPU constraints are accounted for using the same underlying library and technique; however, they require special treatment. The maximum number of concurrent pki/issue requests found in testing (again, specifically for RSA keys) is far lower than the minimum tolerable write request rate. Without separate limiting, we would artificially impose limits on tolerable request rates for non-PKI requests. To specifically target PKI issue requests, we add a new PathsSpecial field, called limited, allowing backends to specify a list of paths which should get special-case request limiting. For the sake of code cleanliness and future extensibility, we introduce the concept of a LimiterRegistry. The registry proposed in this PR has two entries, corresponding with the two vectors above. Each Limiter entry has its own corresponding maximum and minimum concurrency, allowing them to react to latency deviation independently and handle high volumes of requests to targeted bottlenecks (CPU and storage). In both cases, utilization will be effectively throttled before Vault reaches any degraded state. The resulting 503 - Service Unavailable is a retryable HTTP response code, which can be handled to gracefully retry and eventually succeed. Clients should handle this by retrying with jitter and exponential backoff. This is done within Vault's API, using the go-retryablehttp library. Limiter testing was performed via benchmarks of mixed workloads and across a deployment of agent pods with great success.

github-actions · 2024-01-26T17:12:13Z

Build Results:
All builds succeeded! ✅

github-actions · 2024-01-26T17:16:08Z

CI Results:
Failures:

Test Type	Package	Test	Logs
race	vault	TestOIDC_PeriodicFunc	view test results
race	vault	TestOIDC_PeriodicFunc/test-key-nil-next-signing-key	view test results

vault/core.go

raskchanky

Your ENT PR also has changes to vault/request_forwarding.go which don't appear to be in this PR. Is that on purpose?

mpalmi · 2024-01-26T19:12:52Z

Your ENT PR also has changes to vault/request_forwarding.go which don't appear to be in this PR. Is that on purpose?

Yes to this as well! That was a change I made while trying to debug some things. I decided to drop it here. We may have to bring it back if we find that we need changes in the gRPC handler, but until then, leaving it out.

raskchanky

This commit introduces two new adaptive concurrency limiters in Vault, which should handle overloading of the server during periods of untenable request rate. The limiter adjusts the number of allowable in-flight requests based on latency measurements performed across the request duration. This approach allows us to reject entire requests prior to doing any work and prevents clients from exceeding server capacity. The limiters intentionally target two separate vectors that have been proven to lead to server over-utilization. - Back pressure from the storage backend, resulting in bufferbloat in the WAL system. (enterprise) - Back pressure from CPU over-utilization via PKI issue requests (specifically for RSA keys), resulting in failed heartbeats. Storage constraints can be accounted for by limiting logical requests according to their http.Method. We only limit requests with write-based methods, since these will result in storage Puts and exhibit the aforementioned bufferbloat. CPU constraints are accounted for using the same underlying library and technique; however, they require special treatment. The maximum number of concurrent pki/issue requests found in testing (again, specifically for RSA keys) is far lower than the minimum tolerable write request rate. Without separate limiting, we would artificially impose limits on tolerable request rates for non-PKI requests. To specifically target PKI issue requests, we add a new PathsSpecial field, called limited, allowing backends to specify a list of paths which should get special-case request limiting. For the sake of code cleanliness and future extensibility, we introduce the concept of a LimiterRegistry. The registry proposed in this PR has two entries, corresponding with the two vectors above. Each Limiter entry has its own corresponding maximum and minimum concurrency, allowing them to react to latency deviation independently and handle high volumes of requests to targeted bottlenecks (CPU and storage). In both cases, utilization will be effectively throttled before Vault reaches any degraded state. The resulting 503 - Service Unavailable is a retryable HTTP response code, which can be handled to gracefully retry and eventually succeed. Clients should handle this by retrying with jitter and exponential backoff. This is done within Vault's API, using the go-retryablehttp library. Limiter testing was performed via benchmarks of mixed workloads and across a deployment of agent pods with great success.

mpalmi added this to the 1.16.0-rc1 milestone Jan 26, 2024

mpalmi requested review from raskchanky and a team January 26, 2024 16:30

github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jan 26, 2024

mpalmi added 2 commits January 26, 2024 11:33

changelog

e7c3839

fixup! Request Limiter

986ddad

mpalmi marked this pull request as ready for review January 26, 2024 16:55

mpalmi requested a review from a team as a code owner January 26, 2024 16:55

mpalmi mentioned this pull request Jan 26, 2024

Request Limiter reloadable config #25095

Merged

Merge branch 'main' into request-limiter

a9521d8

mpalmi mentioned this pull request Jan 26, 2024

Request Limiter listener config opt-out #25098

Merged

Merge branch 'main' into request-limiter

bfd6a94

raskchanky reviewed Jan 26, 2024

View reviewed changes

vault/core.go Show resolved Hide resolved

raskchanky reviewed Jan 26, 2024

View reviewed changes

raskchanky approved these changes Jan 26, 2024

View reviewed changes

mpalmi merged commit 43be9fc into main Jan 26, 2024
109 of 110 checks passed

mpalmi deleted the request-limiter branch January 26, 2024 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Limiter #25093

Request Limiter #25093

mpalmi commented Jan 26, 2024

github-actions bot commented Jan 26, 2024

github-actions bot commented Jan 26, 2024 •

edited

Loading

raskchanky left a comment

mpalmi commented Jan 26, 2024

raskchanky left a comment

Request Limiter #25093

Request Limiter #25093

Conversation

mpalmi commented Jan 26, 2024

github-actions bot commented Jan 26, 2024

github-actions bot commented Jan 26, 2024 • edited Loading

raskchanky left a comment

Choose a reason for hiding this comment

mpalmi commented Jan 26, 2024

raskchanky left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 26, 2024 •

edited

Loading