diff --git a/README.md b/README.md index 1573f34a5c..2874db597f 100644 --- a/README.md +++ b/README.md @@ -46,44 +46,43 @@ cloud application: ## ⚙️ Load management capabilities - ⏱️ - [**Global Rate-Limiting**](https://docs.fluxninja.com/concepts/rate-limiter): + [**Global Rate and Concurrency Limiting**](https://docs.fluxninja.com/concepts/rate-limiter): Safeguard APIs and features against excessive usage with Aperture's high-performance, distributed rate limiter. Identify individual users or entities by fine-grained labels. Create precise rate limiters controlling - burst-capacity and fill-rate tailored to business-specific labels. Refer to - the [Rate Limiting](https://docs.fluxninja.com/guides/per-user-rate-limiting) - guide for more details. + burst-capacity and fill-rate tailored to business-specific labels. Limit per + user or global concurrency of in-flight requests. Refer to the + [Rate Limiting](https://docs.fluxninja.com/guides/per-user-rate-limiting) and + [Concurrency Limiting](https://docs.fluxninja.com/guides/per-user-concurrency-limiting) + guides for more details. - 📊 - [**API Quota Management**](https://docs.fluxninja.com/concepts/scheduler/quota-scheduler): + [**API Quota Management**](https://docs.fluxninja.com/concepts/request-prioritization/quota-scheduler): Maintain compliance with external API quotas with a global token bucket and smart request queuing. This feature regulates requests aimed at external services, ensuring that the usage remains within prescribed rate limits and avoids penalties or additional costs. Refer to the [API Quota Management](https://docs.fluxninja.com/guides/api-quota-management/) guide for more details. -- 🛡️ - [**Adaptive Queuing**](https://docs.fluxninja.com/concepts/scheduler/load-scheduler): - Enhance resource utilization and safeguard against abrupt service overloads - with an intelligent queue at the entry point of services. This queue - dynamically adjusts the rate of requests based on live service health, thereby - mitigating potential service disruptions and ensuring optimal performance - under all load conditions. Refer to the - [Service Load Management](https://docs.fluxninja.com/guides/service-load-management/) - and - [Database Load Management](https://docs.fluxninja.com/guides/database-load-management/) - guides for more details. +- 🚦 + [**Concurrency Control and Prioritization**](https://docs.fluxninja.com/concepts/request-prioritization/concurrency-scheduler): + Safeguard against abrupt service overloads by limiting the number of + concurrent in-flight requests. Any requests beyond this limit are queued and + let in based on their priority as capacity becomes available. Refer to the + [Concurrency Control and Prioritization](https://docs.fluxninja.com/development/guides/concurrency-control-and-prioritization/) + guide for more details. - 🎯 [**Workload Prioritization**](https://docs.fluxninja.com/concepts/scheduler/): Safeguard crucial user experience pathways and ensure prioritized access to external APIs by strategically prioritizing workloads. With [weighted fair queuing](https://en.wikipedia.org/wiki/Weighted_fair_queueing), Aperture aligns resource distribution with business value and urgency of - requests. Workload prioritization applies to API Quota Management and Adaptive - Queuing use cases. + requests. Workload prioritization applies to API Quota Management and + Concurrency Control and Prioritization use cases. - 💾 [**Caching**](https://docs.fluxninja.com/concepts/cache): Boost application performance and reduce costs by caching costly operations, preventing duplicate requests to pay-per-use services, and easing the load on constrained - services. + services. Refer to the [Caching](https://docs.fluxninja.com/guides/caching) + guide for more details. ## 🏗️ Architecture diff --git a/docs/content/concepts/flow-lifecycle.md b/docs/content/concepts/flow-lifecycle.md index 3eb2565ef8..fe20cc1b26 100644 --- a/docs/content/concepts/flow-lifecycle.md +++ b/docs/content/concepts/flow-lifecycle.md @@ -41,6 +41,8 @@ components for that stage. ::: +### Selection, Classification, and Telemetry + - [**Selectors**](./selector.md) are the criteria used to determine the components that will be applied to a flow in the subsequent stages. - [**Classifiers**](./advanced/classifier.md/) perform the task of assigning @@ -52,27 +54,45 @@ components for that stage. telemetry based on access logs. They transform request flux that matches certain criteria into Prometheus histograms, enabling enhanced observability and control. + +### Rate limiting (fast rejection) + - [**Samplers**](./advanced/load-ramp.md#sampler) manage load by permitting a portion of flows to be accepted, while immediately dropping the remainder with a forbidden status code. They are particularly useful in scenarios such as feature rollouts. - [**Rate-Limiters**](./rate-limiter.md) proactively guard against abuse by regulating excessive requests in accordance with per-label limits. -- **Caches** reduce the cost of operations and alleviate the load on constrained - services by preventing duplicate requests to pay-per-use services. -- [**Schedulers**](./scheduler.md) offer on-demand queuing based on a token - bucket algorithm, and prioritize requests using weighted fair queuing. - Multiple matching schedulers can evaluate concurrently, with each having the - power to drop a flow. There are two variants: - - The [**Load Scheduler**](./request-prioritization/load-scheduler.md) - oversees the current token rate in relation to the past token rate, - adjusting as required based on health signals from a service. This scheduler - type facilitates active service protection. - - The [**Quota Scheduler**](./request-prioritization/quota-scheduler.md) uses - a global token bucket as a ledger, managing the token distribution across - all Agents. It proves especially effective in environments with strict - global rate limits, as it allows for strategic prioritization of requests - when reaching quota limits. +- [**Concurrency-Limiters**](./concurrency-limiter.md) enforce in-flight request + quotas to prevent overloads. They can also be used to enforce limits per + entity such as a user to ensure fair access across users. + +### Request Prioritization and Cache Lookup + +[**Schedulers**](./scheduler.md) offer on-demand queuing based on a limit +enforced through a token bucket or a concurrency counter, and prioritize +requests using weighted fair queuing. Multiple matching schedulers can evaluate +concurrently, with each having the power to drop a flow. There are three +variants running at various stages of the flow lifecycle: + +- The + [**Concurrency Scheduler**](./request-prioritization/concurrency-scheduler.md) + uses a global concurrency counter as a ledger, managing the concurrency across + all Agents. It proves especially effective in environments with strict global + concurrency limits, as it allows for strategic prioritization of requests when + reaching concurrency limits. +- [**Caches**](./cache.md) Look of response and global caches occur at this + stage. If a response cache hit occurs, the flow is not sent to the Concurrency + and Load Scheduling stages, resulting in an early acceptance. +- The [**Quota Scheduler**](./request-prioritization/quota-scheduler.md) uses a + global token bucket as a ledger, managing the token distribution across all + Agents. It proves especially effective in environments with strict global rate + limits, as it allows for strategic prioritization of requests when reaching + quota limits. +- The [**Load Scheduler**](./request-prioritization/load-scheduler.md) oversees + the current token rate in relation to the past token rate, adjusting as + required based on health signals from a service. This scheduler type + facilitates active service protection. After traversing these stages, the flow's decision is sent back to the initiating service. diff --git a/docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling-test.png b/docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling-test.png similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling-test.png rename to docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling-test.png diff --git a/docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling.mmd b/docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling.mmd rename to docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd diff --git a/docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling.mmd.md5sum b/docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd.md5sum similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling.mmd.md5sum rename to docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd.md5sum diff --git a/docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling.mmd.svg b/docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd.svg similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/concurrency-scheduling.mmd.svg rename to docs/content/guides/assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd.svg diff --git a/docs/content/guides/assets/concurrency-quota-management/graph.mmd b/docs/content/guides/assets/concurrency-control-and-prioritization/graph.mmd similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/graph.mmd rename to docs/content/guides/assets/concurrency-control-and-prioritization/graph.mmd diff --git a/docs/content/guides/assets/concurrency-quota-management/graph.mmd.md5sum b/docs/content/guides/assets/concurrency-control-and-prioritization/graph.mmd.md5sum similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/graph.mmd.md5sum rename to docs/content/guides/assets/concurrency-control-and-prioritization/graph.mmd.md5sum diff --git a/docs/content/guides/assets/concurrency-quota-management/graph.mmd.svg b/docs/content/guides/assets/concurrency-control-and-prioritization/graph.mmd.svg similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/graph.mmd.svg rename to docs/content/guides/assets/concurrency-control-and-prioritization/graph.mmd.svg diff --git a/docs/content/guides/assets/concurrency-quota-management/policy.yaml b/docs/content/guides/assets/concurrency-control-and-prioritization/policy.yaml similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/policy.yaml rename to docs/content/guides/assets/concurrency-control-and-prioritization/policy.yaml diff --git a/docs/content/guides/assets/concurrency-quota-management/queue.png b/docs/content/guides/assets/concurrency-control-and-prioritization/queue.png similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/queue.png rename to docs/content/guides/assets/concurrency-control-and-prioritization/queue.png diff --git a/docs/content/guides/assets/concurrency-quota-management/request-metrics.png b/docs/content/guides/assets/concurrency-control-and-prioritization/request-metrics.png similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/request-metrics.png rename to docs/content/guides/assets/concurrency-control-and-prioritization/request-metrics.png diff --git a/docs/content/guides/assets/concurrency-quota-management/validate.sh b/docs/content/guides/assets/concurrency-control-and-prioritization/validate.sh similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/validate.sh rename to docs/content/guides/assets/concurrency-control-and-prioritization/validate.sh diff --git a/docs/content/guides/assets/concurrency-quota-management/values.yaml b/docs/content/guides/assets/concurrency-control-and-prioritization/values.yaml similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/values.yaml rename to docs/content/guides/assets/concurrency-control-and-prioritization/values.yaml diff --git a/docs/content/guides/assets/concurrency-quota-management/workloads.png b/docs/content/guides/assets/concurrency-control-and-prioritization/workloads.png similarity index 100% rename from docs/content/guides/assets/concurrency-quota-management/workloads.png rename to docs/content/guides/assets/concurrency-control-and-prioritization/workloads.png diff --git a/docs/content/guides/concurrency-quota-management.md b/docs/content/guides/concurrency-control-and-prioritization.md similarity index 92% rename from docs/content/guides/concurrency-quota-management.md rename to docs/content/guides/concurrency-control-and-prioritization.md index 80e21baeea..5a6731aa7c 100644 --- a/docs/content/guides/concurrency-quota-management.md +++ b/docs/content/guides/concurrency-control-and-prioritization.md @@ -1,12 +1,11 @@ --- -title: Concurrency Quota Management +title: Concurrency Control and Prioritization sidebar_position: 5 keywords: - - concurrency scheduling - - concurrency quota management - - guides - - external API + - concurrency limiting - prioritization + - guides + - expensive API --- ```mdx-code-block @@ -30,10 +29,9 @@ blueprint. ## Overview -Concurrency quota management, also called concurrency scheduling, is a -sophisticated technique that allows effective management of concurrent requests. -With this technique services can limit the number of concurrent API calls to -alleviate the load on the system. +Concurrency control and prioritization, is a sophisticated technique that allows +effective management of concurrent requests. With this technique, services can +limit the number of concurrent API calls to alleviate the load on the system. When service limits are reached, Aperture Cloud can queue incoming requests and serve them according to their priority, which is determined by business-critical @@ -42,7 +40,7 @@ labels set in the policy and passed via the SDK. ```mermaid -{@include: ./assets/concurrency-quota-management/concurrency-scheduling.mmd} +{@include: ./assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd} ``` @@ -169,7 +167,7 @@ with these specific values: 8. `Control point`: It can be a particular feature or execution block within a service. We'll use `concurrency-scheduling-feature` as an example. -![Concurrency Scheduling Policy](./assets/concurrency-quota-management/concurrency-scheduling-test.png) +![Concurrency Scheduling Policy](./assets/concurrency-control-and-prioritization/concurrency-scheduling-test.png) Once you've completed these fields, click `Continue` and then `Apply Policy` to finalize the policy setup. @@ -213,7 +211,7 @@ scheduling policy: Here is how the complete values file would look: ```yaml -{@include: ./assets/concurrency-quota-management/values.yaml} +{@include: ./assets/concurrency-control-and-prioritization/values.yaml} ``` The last step is to apply the policy using the following command: @@ -258,18 +256,18 @@ in the Aperture Cloud UI. Navigate to the Aperture Cloud UI, and click the Once you've clicked on the policy, you will see the following dashboard: -![Workload](./assets/concurrency-quota-management/workloads.png) +![Workload](./assets/concurrency-control-and-prioritization/workloads.png) The two panels above provide insights into how the policy is performing by monitoring the number of accepted and rejected requests along with the acceptance percentage. -![Request](./assets/concurrency-quota-management/request-metrics.png) +![Request](./assets/concurrency-control-and-prioritization/request-metrics.png) The panels above offer insights into the request details, including their latency. -![Queue](./assets/concurrency-quota-management/queue.png) +![Queue](./assets/concurrency-control-and-prioritization/queue.png) These panels display insights into queue duration for `workload` requests and highlight the average of prioritized requests that moved ahead in the queue. diff --git a/docs/content/introduction.md b/docs/content/introduction.md index 753f123c37..94170a284d 100644 --- a/docs/content/introduction.md +++ b/docs/content/introduction.md @@ -57,16 +57,13 @@ To sign-up to Aperture Cloud, [click here][sign-up]. services, ensuring that the usage remains within prescribed rate limits and avoids penalties or additional costs. Refer to the [API Quota Management](guides/api-quota-management.md) guide for more details. -- 🛡️ [**Adaptive Queuing**](concepts/request-prioritization/load-scheduler.md): - Enhance resource utilization and safeguard against abrupt service overloads - with an intelligent queue at the entry point of services. This queue - dynamically adjusts the rate of requests based on live service health, thereby - mitigating potential service disruptions and ensuring optimal performance - under all load conditions. Refer to the - [Service Load Management](aperture-for-infra/guides/service-load-management/service-load-management.md) - and - [Database Load Management](aperture-for-infra/guides/database-load-management/database-load-management.md) - guides for more details. +- 🚦 + [**Concurrency Control and Prioritization**](concepts/request-prioritization/concurrency-scheduler.md): + Safeguard against abrupt service overloads by limiting the number of + concurrent in-flight requests. Any requests beyond this limit are queued and + let in based on their priority as capacity becomes available. Refer to the + [Concurrency Control and Prioritization](guides/concurrency-control-and-prioritization.md) + guide for more details. - 🎯 [**Workload Prioritization**](concepts/scheduler.md): Safeguard crucial user experience pathways and ensure prioritized access to external APIs by strategically prioritizing workloads. With @@ -74,9 +71,10 @@ To sign-up to Aperture Cloud, [click here][sign-up]. Aperture aligns resource distribution with business value and urgency of requests. Workload prioritization applies to API Quota Management and Adaptive Queuing use cases. -- 💾 **Caching**: Boost application performance and reduce costs by caching - costly operations, preventing duplicate requests to pay-per-use services, and - easing the load on constrained services. +- 💾 [**Caching**](concepts/cache.md): Boost application performance and reduce + costs by caching costly operations, preventing duplicate requests to + pay-per-use services, and easing the load on constrained services. Refer to + the [Caching](guides/caching.md) guide for more details. ## ✨ Get started {#get-started} diff --git a/docs/content/reference/aperture-cli/configure-cli.md b/docs/content/reference/aperture-cli/configure-cli.md index 3740ea238b..21206aaa2c 100644 --- a/docs/content/reference/aperture-cli/configure-cli.md +++ b/docs/content/reference/aperture-cli/configure-cli.md @@ -19,7 +19,7 @@ Replace `ORGANIZATION_NAME` with the Aperture Cloud organization name and `PERSONAL_ACCESS_TOKEN` with the Personal Access Token linked to the user. If a Personal Access Token has not been created, generate a new one through the Aperture Cloud UI. Refer to [Personal Access Tokens][access-tokens] for -additional information. +step-by-step instructions. :::info diff --git a/docs/content/reference/cloud-ui/personal-access-tokens.md b/docs/content/reference/cloud-ui/personal-access-tokens.md index dee17fb8dd..ccaf11f260 100644 --- a/docs/content/reference/cloud-ui/personal-access-tokens.md +++ b/docs/content/reference/cloud-ui/personal-access-tokens.md @@ -11,8 +11,8 @@ import Zoom from 'react-medium-image-zoom'; ``` Aperture Cloud uses Personal Access Tokens to authenticate requests coming from -[aperturectl][configure aperturectl]. You can create Personal Access Tokens for -your user in the Aperture Cloud UI. +[aperturectl][aperturectl]. You can create Personal Access Tokens for your user +in the Aperture Cloud UI. ## Pre-requisites @@ -36,5 +36,9 @@ You have [signed up][sign-up] on Aperture Cloud and created an organization. ![New Personal Access Token](./assets/personal-access-keys/new-personal-access-token.png) -[configure aperturectl]: /reference/aperture-cli/aperture-cli.md +5. Refer to the [aperturectl configuration][configure aperturectl] to learn how + to use the Access Token. + +[aperturectl]: /reference/aperture-cli/aperture-cli.md +[configure aperturectl]: /reference/aperture-cli/configure-cli.md [sign-up]: /reference/cloud-ui/sign-up.md diff --git a/playground/README.md b/playground/README.md index a9fd01f619..52491101ca 100644 --- a/playground/README.md +++ b/playground/README.md @@ -134,14 +134,14 @@ The load generator is configured to generate the following traffic pattern for - Hold at `5` concurrent users for `2m`. Once the traffic is running, you can visualize the decisions made by Aperture in -Grafana. Navigate to [localhost:3000](http://localhost:3000) on your browser to +Grafana. Navigate to [localhost:3333](http://localhost:3333) on your browser to reach Grafana. You can open the `FluxNinja` dashboard under `aperture-system` folder to a bunch of useful panels. ![Grafana Dashboard](./assets/dashboard.png) > 📍 Grafana's dashboard browser address is -> [localhost:3000/dashboards](http://localhost:3000/dashboards) +> [localhost:3333/dashboards](http://localhost:3333/dashboards) To stop the traffic at any point of time, press the `Stop Wavepool Generator` button in the `DemoApplications` resource. diff --git a/playground/Tiltfile b/playground/Tiltfile index 950819d09c..70ceaf6ef9 100644 --- a/playground/Tiltfile +++ b/playground/Tiltfile @@ -1207,7 +1207,7 @@ def declare_resources(resources, dep_tree, inv_dep_tree, race_arg, cloud_extensi labels=["ApertureController"], resource_deps=["grafana"], service="aperture-grafana", - local_port=3000, + local_port=3333, remote_port=3000, extra_env={ "PERIOD": "1",