fluxninja · tanveergill · Dec 28, 2023 · Dec 27, 2023 · Dec 27, 2023
@@ -281,7 +281,7 @@ message RateLimiter {
 //
 // :::info
 //
-// See also [_Load Scheduler_ overview](/concepts/scheduler/load-scheduler.md).
+// See also [_Load Scheduler_ overview](/concepts/request-prioritization/load-scheduler.md).
 //
 // :::
 //
@@ -416,7 +416,7 @@ message Scheduler {
   // :::info
   //
   // See also [workload definition in the concepts
-  // section](/concepts/scheduler/scheduler.md#workload).
+  // section](/concepts/scheduler.md#workload).
   //
   // :::
   repeated Workload workloads = 1; // @gotags: validate:"dive"

@@ -9,8 +9,8 @@ deps:
   - remote: buf.build
     owner: envoyproxy
     repository: envoy
-    commit: 53333dc2a8944f15a2e47288aebb54c0
-    digest: shake256:c0423eef8c867a4a1f4c1f20d6f3b5a80aba61d610e66568ec90f3bd06a096541189e9de2a1da7bf6489eaed40015cbe760823d09882d7b12aecf896b7250c8d
+    commit: c941fabf06ef4a1c83625afcaa4b34e1
+    digest: shake256:41c78a1709e5ee9fa85bd40d954f333b1a04880f10aaf5591e5a3af3c1eccbb43840e4cd96684619fada66d9a32e40a57efc34888c9ac7afe798df206f41102a
   - remote: buf.build
     owner: envoyproxy
     repository: protoc-gen-validate

@@ -2280,7 +2280,7 @@
       "additionalProperties": false
     },
     "LoadScheduler": {
-      "description": ":::info\n\nSee also [_Load Scheduler_ overview](/concepts/scheduler/load-scheduler.md).\n\n:::\n\nTo make scheduling decisions the Flows are mapped into Workloads by providing match rules.\nA workload determines the priority and cost of admitting each Flow that belongs to it.\nScheduling of Flows is based on Weighted Fair Queuing principles.\n\nThe signal at port `load_multiplier` determines the fraction of incoming tokens that get admitted. The signals gets acted on once every 10 seconds.",
+      "description": ":::info\n\nSee also [_Load Scheduler_ overview](/concepts/request-prioritization/load-scheduler.md).\n\n:::\n\nTo make scheduling decisions the Flows are mapped into Workloads by providing match rules.\nA workload determines the priority and cost of admitting each Flow that belongs to it.\nScheduling of Flows is based on Weighted Fair Queuing principles.\n\nThe signal at port `load_multiplier` determines the fraction of incoming tokens that get admitted. The signals gets acted on once every 10 seconds.",
       "properties": {
         "dry_run": {
           "description": "Decides whether to run the load scheduler in dry-run mode. In dry run mode the scheduler acts as pass through to all flow and does not queue flows.\nIt is useful for observing the behavior of load scheduler without disrupting any real traffic.",
@@ -3608,7 +3608,7 @@
           "type": "string"
         },
         "workloads": {
-          "description": "List of workloads to be used in scheduler.\n\nCategorizing flows into workloads\nallows for load throttling to be \"intelligent\" instead of queueing flows in an arbitrary order.\nThere are two aspects of this \"intelligence\":\n* Scheduler can more precisely calculate concurrency if it understands\n  that flows belonging to different classes have different weights (for example, insert queries compared to select queries).\n* Setting different priorities to different workloads lets the scheduler\n  avoid dropping important traffic during overload.\n\nEach workload in this list specifies also a matcher that is used to\ndetermine which flow will be categorized into which workload.\nIn case of multiple matching workloads, the first matching one will be used.\nIf none of workloads match, `default_workload` will be used.\n\n:::info\n\nSee also [workload definition in the concepts\nsection](/concepts/scheduler/scheduler.md#workload).\n\n:::\n\n",
+          "description": "List of workloads to be used in scheduler.\n\nCategorizing flows into workloads\nallows for load throttling to be \"intelligent\" instead of queueing flows in an arbitrary order.\nThere are two aspects of this \"intelligence\":\n* Scheduler can more precisely calculate concurrency if it understands\n  that flows belonging to different classes have different weights (for example, insert queries compared to select queries).\n* Setting different priorities to different workloads lets the scheduler\n  avoid dropping important traffic during overload.\n\nEach workload in this list specifies also a matcher that is used to\ndetermine which flow will be categorized into which workload.\nIn case of multiple matching workloads, the first matching one will be used.\nIf none of workloads match, `default_workload` will be used.\n\n:::info\n\nSee also [workload definition in the concepts\nsection](/concepts/scheduler.md#workload).\n\n:::\n\n",
           "items": {
             "$ref": "#/definitions/SchedulerWorkload",
             "type": "object"

@@ -2484,7 +2484,7 @@ definitions:
         description: |-
             :::info
 
-            See also [_Load Scheduler_ overview](/concepts/scheduler/load-scheduler.md).
+            See also [_Load Scheduler_ overview](/concepts/request-prioritization/load-scheduler.md).
 
             :::
 
@@ -3927,7 +3927,7 @@ definitions:
                     :::info
 
                     See also [workload definition in the concepts
-                    section](/concepts/scheduler/scheduler.md#workload).
+                    section](/concepts/scheduler.md#workload).
 
                     :::
 

@@ -3174,7 +3174,7 @@ definitions:
         description: |-
             :::info
 
-            See also [_Load Scheduler_ overview](/concepts/scheduler/load-scheduler.md).
+            See also [_Load Scheduler_ overview](/concepts/request-prioritization/load-scheduler.md).
 
             :::
 
@@ -4765,7 +4765,7 @@ definitions:
                     :::info
 
                     See also [workload definition in the concepts
-                    section](/concepts/scheduler/scheduler.md#workload).
+                    section](/concepts/scheduler.md#workload).
 
                     :::
 

@@ -1,6 +1,6 @@
 ---
 title: Advanced
-sidebar_position: 8
+sidebar_position: 10
 ---
 
 ```mdx-code-block

@@ -45,7 +45,8 @@ details.
   synchronization. Agents within the same group form a peer-to-peer network to
   synchronize fine-grained per label counters. These counters are crucial for
   [rate-limiting](/concepts/rate-limiter.md) and for implementing global token
-  buckets used in [quota scheduling](/concepts/scheduler/quota-scheduler.md).
+  buckets used in
+  [quota scheduling](/concepts/request-prioritization/quota-scheduler.md).
   Additionally, all Agents within an agent group instantiate the same set of
   flow control components as defined in the
   [policies](/concepts/advanced/policy.md) running at the Controller. This

@@ -1,6 +1,6 @@
 ---
 title: Cache
-sidebar_position: 7
+sidebar_position: 9
 ---
 
 Aperture's _Cache_ can be used to reduce the load on a service by caching the
@@ -14,7 +14,7 @@

 1. Create an instance of Aperture `Client`.
 2. Instantiate a `Flow` by calling the `StartFlow` method with `resultCacheKey`
   parameter set to your desired value. The first call will let Aperture
   initialize a cache entry for the flow, uniquely identified by the
   `ControlPoint` and `ResultCacheKey` values. Subsequent calls will return the
   cached value as part of the response object.
@@ -48,7 +48,7 @@
 method with `key` parameter on the `Flow` object. It returns the same object as
 the `ResultCache` method.

 Similar to `ResultCache`, `Set` and `Delete` methods can be used to set and
 delete entries in the _Global Cache_.

 [skds]: /sdk/sdk.md

@@ -0,0 +1,53 @@
+---
+title: Concurrency Limiter
+sidebar_position: 6
+---
+
+:::info See also
+
+[_Concurrency Limiter_ reference][reference]
+
+:::
+
+The _Concurrency Limiter_ component is used to enforce in-flight request quotas
+to prevent overloads. It can also be used to enforce limits per entity such as a
+user to ensure fair access across users. Essentially, providing an added layer
+of protection in additional to per-user rate limits.
+
+_Concurrency Limiter_ can limit the number of concurrent requests to a control
+point or certain labels that match within the control point. It achieves this by
+maintaining a ledger of in-flight requests. If the number of in-flight requests
+exceeds the configured limit, the _Concurrency Limiter_ rejects new requests
+until the number of in-flight requests drops below the limit. The in-flight
+requests are maintained by the Agents based on the flow start and end calls made
+from the SDKs. Alternatively, for proxy integrations, the flow end is inferred
+as the access log stream is received from the underlying middleware or proxy.
+
+## Distributed Request Ledgers {#distributed-request-ledgers}
+
+For each configured [_Concurrency Limiter Component_][reference], every matching
+Aperture Agent instantiates a copy of the _Concurrency Limiter_. Although each
+agent has its own copy of the component, they all share the in-flight request
+ledger through a distributed cache. This means that they work together as a
+single _Concurrency Limiter_, providing seamless coordination and control across
+Agents. The Agents within an [agent group][agent-group] constantly share state
+and detect failures using a gossip protocol.
+
+## Lifecycle of a Request {#lifecycle-of-a-request}
+
+The _Concurrency Limiter_ maintains a ledger of in-flight requests. The ledger
+is updated by the Agents based on the flow start and end calls made from the
+SDKs. Alternatively, for proxy integrations, the flow end is inferred as the
+access log stream is received from the underlying middleware or proxy.
+
+### Max In-flight Duration {#max-in-flight-duration}
+
+In case of failures at the SDK or middleware/proxy, the flow end call might not
+be made. To prevent stale entries in the ledger, the _Concurrency Limiter_
+allows the definition of a maximum in-flight duration. This can be set according
+to the maximum time a request is expected to take. If the request exceeds the
+configured duration, it is automatically removed from the ledger by the
+_Concurrency Limiter_.
+
+[reference]: /reference/configuration/spec.md#concurrency-limiter
+[agent-group]: /concepts/selector.md#agent-group
@@ -24,8 +24,23 @@ or configured when integrating with API Gateways or Service Meshes.
 
 To empower Aperture to act at any of the control points, integrations need to be
 installed to be able to interact with the Aperture Agent. Here are the two
-primary types of control points: HTTP/gRPC control points and Feature Control
-Points.
+primary types of control points: Feature control points and HTTP/gRPC control
+points.
+
+### Feature Control Points
+
+Feature control points are facilitated by the [Aperture SDKs](/sdk/sdk.md),
+which are available for various popular programming languages. These SDKs allow
+any function call or code snippet within the service code to be wrapped as a
+feature control point. In Aperture's context, every execution of the feature is
+seen as a flow.
+
+The SDK offers an API to initiate a flow, which corresponds to a
+[`flowcontrol.v1.Check`][flowcontrol-proto] call into the Agent. The response
+from this call comprises a decision on whether to accept or reject the flow. The
+execution of a feature might be gated based on this decision. There is also an
+API to end a flow, which creates an OpenTelemetry span representing the flow and
+dispatches it to the Agent.
 
 ### HTTP/gRPC Control Points
 
@@ -45,21 +60,6 @@ identified by
 [PatchContext](https://istio.io/latest/docs/reference/config/networking/envoy-filter/#EnvoyFilter-PatchContext)
 of Istio's EnvoyFilter CRD.
 
-### Feature Control Points
-
-Feature control points are facilitated by the [Aperture SDKs](/sdk/sdk.md),
-which are available for a variety of popular programming languages. These SDKs
-allow any function call or code snippet within the service code to be wrapped as
-a feature control point. In Aperture's context, every execution of the feature
-is seen as a flow.
-
-The SDK offers an API to initiate a flow, which corresponds to a
-[`flowcontrol.v1.Check`][flowcontrol-proto] call into the Agent. The response
-from this call comprises a decision on whether to accept or reject the flow. The
-execution of a feature might be gated based on this decision. There is also an
-API to end a flow, which creates an OpenTelemetry span representing the flow and
-dispatches it to the Agent.
-
 ## Understanding Control Points
 
 <Zoom>

@@ -219,9 +219,9 @@ For _Classifier_ created labels, you can disable this behavior by setting
 
 [selectors]: ./selector.md
 [classifier]: ./advanced/classifier.md
-[workload]: ./scheduler/scheduler.md#workload
+[workload]: ./scheduler.md#workload
 [ratelimiter]: ./rate-limiter.md
-[quota-scheduler]: ./scheduler/quota-scheduler.md
+[quota-scheduler]: ./request-prioritization/quota-scheduler.md
 [flux-meter]: ./advanced/flux-meter.md
 [baggage]: https://www.w3.org/TR/baggage/#baggage-http-header-format
 [traces]:

@@ -13,8 +13,6 @@ keywords:
 import Zoom from 'react-medium-image-zoom';
 ```
 
-## Flow Lifecycle
-
 The lifecycle of a flow begins when a service initiates it, requesting a
 decision from the Aperture Agent. As the flow enters the Aperture Agent, it
 embarks on a journey through multiple stages before a final decision is made.
@@ -62,19 +60,19 @@ components for that stage.
   regulating excessive requests in accordance with per-label limits.
 - **Caches** reduce the cost of operations and alleviate the load on constrained
   services by preventing duplicate requests to pay-per-use services.
-- [**Schedulers**](./scheduler/scheduler.md) offer on-demand queuing based on a
-  token bucket algorithm, and prioritize requests using weighted fair queuing.
+- [**Schedulers**](./scheduler.md) offer on-demand queuing based on a token
+  bucket algorithm, and prioritize requests using weighted fair queuing.
   Multiple matching schedulers can evaluate concurrently, with each having the
   power to drop a flow. There are two variants:
-  - The [**Load Scheduler**](./scheduler/load-scheduler.md) oversees the current
-    token rate in relation to the past token rate, adjusting as required based
-    on health signals from a service. This scheduler type facilitates active
-    service protection.
-  - The [**Quota Scheduler**](./scheduler/quota-scheduler.md) uses a global
-    token bucket as a ledger, managing the token distribution across all Agents.
-    It proves especially effective in environments with strict global rate
-    limits, as it allows for strategic prioritization of requests when reaching
-    quota limits.
+  - The [**Load Scheduler**](./request-prioritization/load-scheduler.md)
+    oversees the current token rate in relation to the past token rate,
+    adjusting as required based on health signals from a service. This scheduler
+    type facilitates active service protection.
+  - The [**Quota Scheduler**](./request-prioritization/quota-scheduler.md) uses
+    a global token bucket as a ledger, managing the token distribution across
+    all Agents. It proves especially effective in environments with strict
+    global rate limits, as it allows for strategic prioritization of requests
+    when reaching quota limits.
 
 After traversing these stages, the flow's decision is sent back to the
 initiating service.
@@ -9,9 +9,10 @@ sidebar_position: 5
 
 :::
 
-The _Rate Limiter_ component can be used to prevent recurring overloads by
-proactively regulating heavy-hitters. It achieves this by accepting or rejecting
-incoming flows based on per-label limits, which are configured using the
+The _Rate Limiter_ component can be used to ensure fair access and manage costs
+by regulating the number of requests made by an entity over time. It achieves
+this by accepting or rejecting incoming requests based on per-label limits,
+which are configured using the
 [token bucket algorithm](https://en.wikipedia.org/wiki/Token_bucket).
 
 The _Rate Limiter_ is a component of Aperture's [policy][policies] system, and
@@ -84,17 +85,19 @@ inaccuracy within a (small) time window (sync interval).
 The _Rate Limiter_ component accepts or rejects incoming flows based on
 per-label limits, configured as the maximum number of requests per a given
 period of time. The rate-limiting label is chosen from the
-[flow-label][flow-label] with a specific key, enabling distinct limits per user
+[flow-label][flow-label] with a specific key, enabling distinct limits per-user
 as identified by unique values of the label.
 
-:::tip
+:::info
 
-The limit value is provided as a signal within the circuit. It can be set
-dynamically based on the circuit's logic.
+Refer to the [Per-user Rate Limiting guide][guide] for more information on how
+to use the _Rate Limiter_ using [aperture-js][aperture-js] SDK.
 
 :::
 
 [reference]: /reference/configuration/spec.md#rate-limiter
 [agent-group]: /concepts/selector.md#agent-group
 [policies]: /concepts/advanced/policy.md
 [flow-label]: /concepts/flow-label.md
+[guide]: /guides/per-user-rate-limiting.md
+[aperture-js]: https://github.com/fluxninja/aperture-js
@@ -0,0 +1,44 @@
+---
+title: Concurrency Scheduler
+keywords:
+  - scheduler
+  - concurrency
+  - queuing
+sidebar_position: 2
+---
+
+:::info See Also
+
+Concurrency Scheduler
+[Reference](/reference/configuration/spec.md#concurrency-scheduler)
+
+:::
+
+The _Concurrency Scheduler_ is used to schedule requests based on importance
+while ensuring that the application adheres to concurrency limits.
+
+The _Concurrency Scheduler_ can be thought of as a combination of a
+[_Scheduler_](../scheduler.md) and a
+[_Concurrency Limiter_](../concurrency-limiter.md). It essentially provides
+scheduling capabilities atop a _Concurrency Limiter_. Similar to the
+_Concurrency Limiter_, this component takes `max_concurrency` as an input port
+which determines the maximum number of in-flight requests in the global request
+ledger.
+
+The global request ledger is shared among Agents in an
+[agent group](../advanced/agent-group.md). This ledger records the total number
+of in-flight requests across the Agents. If the ledger exceeds the configured
+`max_concurrency`, new requests are queued until the number of in-flight
+requests drops below the limit or
+[until timeout](../scheduler.md#queue-timeout).
+
+In a scenario where the maximum concurrency is known upfront, the _Concurrency
+Scheduler_ becomes particularly beneficial to enforce concurrency limits on a
+per-service basis.
+
+The _Concurrency Scheduler_ also allows the definition of
+[workloads](../scheduler.md#workload), a property of the scheduler, which allows
+for strategic prioritization of requests when faced with concurrency
+constraints. As a result, the _Concurrency Scheduler_ ensures adherence to the
+concurrency limits and simultaneously offers a mechanism to prioritize requests
+based on their importance.