Skip to content

Commit

Permalink
Merge branch 'main' into ksohi/add-mistral-guide
Browse files Browse the repository at this point in the history
  • Loading branch information
hdkshingala committed Jan 8, 2024
2 parents 2043ba9 + 2c0e61d commit 6be1ce8
Show file tree
Hide file tree
Showing 21 changed files with 88 additions and 69 deletions.
37 changes: 18 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,44 +46,43 @@ cloud application:
## ⚙️ Load management capabilities

- ⏱️
[**Global Rate-Limiting**](https://docs.fluxninja.com/concepts/rate-limiter):
[**Global Rate and Concurrency Limiting**](https://docs.fluxninja.com/concepts/rate-limiter):
Safeguard APIs and features against excessive usage with Aperture's
high-performance, distributed rate limiter. Identify individual users or
entities by fine-grained labels. Create precise rate limiters controlling
burst-capacity and fill-rate tailored to business-specific labels. Refer to
the [Rate Limiting](https://docs.fluxninja.com/guides/per-user-rate-limiting)
guide for more details.
burst-capacity and fill-rate tailored to business-specific labels. Limit per
user or global concurrency of in-flight requests. Refer to the
[Rate Limiting](https://docs.fluxninja.com/guides/per-user-rate-limiting) and
[Concurrency Limiting](https://docs.fluxninja.com/guides/per-user-concurrency-limiting)
guides for more details.
- 📊
[**API Quota Management**](https://docs.fluxninja.com/concepts/scheduler/quota-scheduler):
[**API Quota Management**](https://docs.fluxninja.com/concepts/request-prioritization/quota-scheduler):
Maintain compliance with external API quotas with a global token bucket and
smart request queuing. This feature regulates requests aimed at external
services, ensuring that the usage remains within prescribed rate limits and
avoids penalties or additional costs. Refer to the
[API Quota Management](https://docs.fluxninja.com/guides/api-quota-management/)
guide for more details.
- 🛡️
[**Adaptive Queuing**](https://docs.fluxninja.com/concepts/scheduler/load-scheduler):
Enhance resource utilization and safeguard against abrupt service overloads
with an intelligent queue at the entry point of services. This queue
dynamically adjusts the rate of requests based on live service health, thereby
mitigating potential service disruptions and ensuring optimal performance
under all load conditions. Refer to the
[Service Load Management](https://docs.fluxninja.com/guides/service-load-management/)
and
[Database Load Management](https://docs.fluxninja.com/guides/database-load-management/)
guides for more details.
- 🚦
[**Concurrency Control and Prioritization**](https://docs.fluxninja.com/concepts/request-prioritization/concurrency-scheduler):
Safeguard against abrupt service overloads by limiting the number of
concurrent in-flight requests. Any requests beyond this limit are queued and
let in based on their priority as capacity becomes available. Refer to the
[Concurrency Control and Prioritization](https://docs.fluxninja.com/development/guides/concurrency-control-and-prioritization/)
guide for more details.
- 🎯
[**Workload Prioritization**](https://docs.fluxninja.com/concepts/scheduler/):
Safeguard crucial user experience pathways and ensure prioritized access to
external APIs by strategically prioritizing workloads. With
[weighted fair queuing](https://en.wikipedia.org/wiki/Weighted_fair_queueing),
Aperture aligns resource distribution with business value and urgency of
requests. Workload prioritization applies to API Quota Management and Adaptive
Queuing use cases.
requests. Workload prioritization applies to API Quota Management and
Concurrency Control and Prioritization use cases.
- 💾 [**Caching**](https://docs.fluxninja.com/concepts/cache): Boost application
performance and reduce costs by caching costly operations, preventing
duplicate requests to pay-per-use services, and easing the load on constrained
services.
services. Refer to the [Caching](https://docs.fluxninja.com/guides/caching)
guide for more details.

## 🏗️ Architecture

Expand Down
50 changes: 35 additions & 15 deletions docs/content/concepts/flow-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ components for that stage.

:::

### Selection, Classification, and Telemetry

- [**Selectors**](./selector.md) are the criteria used to determine the
components that will be applied to a flow in the subsequent stages.
- [**Classifiers**](./advanced/classifier.md/) perform the task of assigning
Expand All @@ -52,27 +54,45 @@ components for that stage.
telemetry based on access logs. They transform request flux that matches
certain criteria into Prometheus histograms, enabling enhanced observability
and control.

### Rate limiting (fast rejection)

- [**Samplers**](./advanced/load-ramp.md#sampler) manage load by permitting a
portion of flows to be accepted, while immediately dropping the remainder with
a forbidden status code. They are particularly useful in scenarios such as
feature rollouts.
- [**Rate-Limiters**](./rate-limiter.md) proactively guard against abuse by
regulating excessive requests in accordance with per-label limits.
- **Caches** reduce the cost of operations and alleviate the load on constrained
services by preventing duplicate requests to pay-per-use services.
- [**Schedulers**](./scheduler.md) offer on-demand queuing based on a token
bucket algorithm, and prioritize requests using weighted fair queuing.
Multiple matching schedulers can evaluate concurrently, with each having the
power to drop a flow. There are two variants:
- The [**Load Scheduler**](./request-prioritization/load-scheduler.md)
oversees the current token rate in relation to the past token rate,
adjusting as required based on health signals from a service. This scheduler
type facilitates active service protection.
- The [**Quota Scheduler**](./request-prioritization/quota-scheduler.md) uses
a global token bucket as a ledger, managing the token distribution across
all Agents. It proves especially effective in environments with strict
global rate limits, as it allows for strategic prioritization of requests
when reaching quota limits.
- [**Concurrency-Limiters**](./concurrency-limiter.md) enforce in-flight request
quotas to prevent overloads. They can also be used to enforce limits per
entity such as a user to ensure fair access across users.

### Request Prioritization and Cache Lookup

[**Schedulers**](./scheduler.md) offer on-demand queuing based on a limit
enforced through a token bucket or a concurrency counter, and prioritize
requests using weighted fair queuing. Multiple matching schedulers can evaluate
concurrently, with each having the power to drop a flow. There are three
variants running at various stages of the flow lifecycle:

- The
[**Concurrency Scheduler**](./request-prioritization/concurrency-scheduler.md)
uses a global concurrency counter as a ledger, managing the concurrency across
all Agents. It proves especially effective in environments with strict global
concurrency limits, as it allows for strategic prioritization of requests when
reaching concurrency limits.
- [**Caches**](./cache.md) Look of response and global caches occur at this
stage. If a response cache hit occurs, the flow is not sent to the Concurrency
and Load Scheduling stages, resulting in an early acceptance.
- The [**Quota Scheduler**](./request-prioritization/quota-scheduler.md) uses a
global token bucket as a ledger, managing the token distribution across all
Agents. It proves especially effective in environments with strict global rate
limits, as it allows for strategic prioritization of requests when reaching
quota limits.
- The [**Load Scheduler**](./request-prioritization/load-scheduler.md) oversees
the current token rate in relation to the past token rate, adjusting as
required based on health signals from a service. This scheduler type
facilitates active service protection.

After traversing these stages, the flow's decision is sent back to the
initiating service.
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: Concurrency Quota Management
title: Concurrency Control and Prioritization
sidebar_position: 5
keywords:
- concurrency scheduling
- concurrency quota management
- guides
- external API
- concurrency limiting
- prioritization
- guides
- expensive API
---

```mdx-code-block
Expand All @@ -30,10 +29,9 @@ blueprint.

## Overview

Concurrency quota management, also called concurrency scheduling, is a
sophisticated technique that allows effective management of concurrent requests.
With this technique services can limit the number of concurrent API calls to
alleviate the load on the system.
Concurrency control and prioritization, is a sophisticated technique that allows
effective management of concurrent requests. With this technique, services can
limit the number of concurrent API calls to alleviate the load on the system.

When service limits are reached, Aperture Cloud can queue incoming requests and
serve them according to their priority, which is determined by business-critical
Expand All @@ -42,7 +40,7 @@ labels set in the policy and passed via the SDK.
<Zoom>

```mermaid
{@include: ./assets/concurrency-quota-management/concurrency-scheduling.mmd}
{@include: ./assets/concurrency-control-and-prioritization/concurrency-scheduling.mmd}
```

</Zoom>
Expand Down Expand Up @@ -169,7 +167,7 @@ with these specific values:
8. `Control point`: It can be a particular feature or execution block within a
service. We'll use `concurrency-scheduling-feature` as an example.

![Concurrency Scheduling Policy](./assets/concurrency-quota-management/concurrency-scheduling-test.png)
![Concurrency Scheduling Policy](./assets/concurrency-control-and-prioritization/concurrency-scheduling-test.png)

Once you've completed these fields, click `Continue` and then `Apply Policy` to
finalize the policy setup.
Expand Down Expand Up @@ -213,7 +211,7 @@ scheduling policy:
Here is how the complete values file would look:

```yaml
{@include: ./assets/concurrency-quota-management/values.yaml}
{@include: ./assets/concurrency-control-and-prioritization/values.yaml}
```

The last step is to apply the policy using the following command:
Expand Down Expand Up @@ -258,18 +256,18 @@ in the Aperture Cloud UI. Navigate to the Aperture Cloud UI, and click the

Once you've clicked on the policy, you will see the following dashboard:

![Workload](./assets/concurrency-quota-management/workloads.png)
![Workload](./assets/concurrency-control-and-prioritization/workloads.png)

The two panels above provide insights into how the policy is performing by
monitoring the number of accepted and rejected requests along with the
acceptance percentage.

![Request](./assets/concurrency-quota-management/request-metrics.png)
![Request](./assets/concurrency-control-and-prioritization/request-metrics.png)

The panels above offer insights into the request details, including their
latency.

![Queue](./assets/concurrency-quota-management/queue.png)
![Queue](./assets/concurrency-control-and-prioritization/queue.png)

These panels display insights into queue duration for `workload` requests and
highlight the average of prioritized requests that moved ahead in the queue.
Expand Down
24 changes: 11 additions & 13 deletions docs/content/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,26 +57,24 @@ To sign-up to Aperture Cloud, [click here][sign-up].
services, ensuring that the usage remains within prescribed rate limits and
avoids penalties or additional costs. Refer to the
[API Quota Management](guides/api-quota-management.md) guide for more details.
- 🛡️ [**Adaptive Queuing**](concepts/request-prioritization/load-scheduler.md):
Enhance resource utilization and safeguard against abrupt service overloads
with an intelligent queue at the entry point of services. This queue
dynamically adjusts the rate of requests based on live service health, thereby
mitigating potential service disruptions and ensuring optimal performance
under all load conditions. Refer to the
[Service Load Management](aperture-for-infra/guides/service-load-management/service-load-management.md)
and
[Database Load Management](aperture-for-infra/guides/database-load-management/database-load-management.md)
guides for more details.
- 🚦
[**Concurrency Control and Prioritization**](concepts/request-prioritization/concurrency-scheduler.md):
Safeguard against abrupt service overloads by limiting the number of
concurrent in-flight requests. Any requests beyond this limit are queued and
let in based on their priority as capacity becomes available. Refer to the
[Concurrency Control and Prioritization](guides/concurrency-control-and-prioritization.md)
guide for more details.
- 🎯 [**Workload Prioritization**](concepts/scheduler.md): Safeguard crucial
user experience pathways and ensure prioritized access to external APIs by
strategically prioritizing workloads. With
[weighted fair queuing](https://en.wikipedia.org/wiki/Weighted_fair_queueing),
Aperture aligns resource distribution with business value and urgency of
requests. Workload prioritization applies to API Quota Management and Adaptive
Queuing use cases.
- 💾 **Caching**: Boost application performance and reduce costs by caching
costly operations, preventing duplicate requests to pay-per-use services, and
easing the load on constrained services.
- 💾 [**Caching**](concepts/cache.md): Boost application performance and reduce
costs by caching costly operations, preventing duplicate requests to
pay-per-use services, and easing the load on constrained services. Refer to
the [Caching](guides/caching.md) guide for more details.

## ✨ Get started {#get-started}

Expand Down
2 changes: 1 addition & 1 deletion docs/content/reference/aperture-cli/configure-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Replace `ORGANIZATION_NAME` with the Aperture Cloud organization name and
`PERSONAL_ACCESS_TOKEN` with the Personal Access Token linked to the user. If a
Personal Access Token has not been created, generate a new one through the
Aperture Cloud UI. Refer to [Personal Access Tokens][access-tokens] for
additional information.
step-by-step instructions.

:::info

Expand Down
10 changes: 7 additions & 3 deletions docs/content/reference/cloud-ui/personal-access-tokens.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ import Zoom from 'react-medium-image-zoom';
```

Aperture Cloud uses Personal Access Tokens to authenticate requests coming from
[aperturectl][configure aperturectl]. You can create Personal Access Tokens for
your user in the Aperture Cloud UI.
[aperturectl][aperturectl]. You can create Personal Access Tokens for your user
in the Aperture Cloud UI.

## Pre-requisites

Expand All @@ -36,5 +36,9 @@ You have [signed up][sign-up] on Aperture Cloud and created an organization.

![New Personal Access Token](./assets/personal-access-keys/new-personal-access-token.png)

[configure aperturectl]: /reference/aperture-cli/aperture-cli.md
5. Refer to the [aperturectl configuration][configure aperturectl] to learn how
to use the Access Token.

[aperturectl]: /reference/aperture-cli/aperture-cli.md
[configure aperturectl]: /reference/aperture-cli/configure-cli.md
[sign-up]: /reference/cloud-ui/sign-up.md
4 changes: 2 additions & 2 deletions playground/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,14 +134,14 @@ The load generator is configured to generate the following traffic pattern for
- Hold at `5` concurrent users for `2m`.

Once the traffic is running, you can visualize the decisions made by Aperture in
Grafana. Navigate to [localhost:3000](http://localhost:3000) on your browser to
Grafana. Navigate to [localhost:3333](http://localhost:3333) on your browser to
reach Grafana. You can open the `FluxNinja` dashboard under `aperture-system`
folder to a bunch of useful panels.

![Grafana Dashboard](./assets/dashboard.png)

> 📍 Grafana's dashboard browser address is
> [localhost:3000/dashboards](http://localhost:3000/dashboards)
> [localhost:3333/dashboards](http://localhost:3333/dashboards)
To stop the traffic at any point of time, press the `Stop Wavepool Generator`
button in the `DemoApplications` resource.
Expand Down
2 changes: 1 addition & 1 deletion playground/Tiltfile
Original file line number Diff line number Diff line change
Expand Up @@ -1207,7 +1207,7 @@ def declare_resources(resources, dep_tree, inv_dep_tree, race_arg, cloud_extensi
labels=["ApertureController"],
resource_deps=["grafana"],
service="aperture-grafana",
local_port=3000,
local_port=3333,
remote_port=3000,
extra_env={
"PERIOD": "1",
Expand Down

0 comments on commit 6be1ce8

Please sign in to comment.