Skip to content

Commit

Permalink
fix vale issues
Browse files Browse the repository at this point in the history
  • Loading branch information
karansohi committed Jan 6, 2024
1 parent 25fe3cd commit 2043ba9
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 23 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/content/guides/concurrency-quota-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ and API Key. In the Aperture Cloud UI, select the Aperture tab from the sidebar
menu. Copy and enter both your Organization address and API Key to establish a
connection between the SDK and Aperture Cloud.

## Monitoring concurrency Scheduling Policy
## Monitoring Concurrency Scheduling Policy

After running the example for a few minutes, you can review the telemetry data
in the Aperture Cloud UI. Navigate to the Aperture Cloud UI, and click the
Expand Down
53 changes: 31 additions & 22 deletions docs/content/guides/mistral.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,22 +34,22 @@ Such constraints, combined with an increase in demand, often result in slower
response times to prompts, which leads to degradation of user experience during
peak loads.

Aperture handles peak loads and preserves the user experience with Concurrency
Scheduling feature by efficiently scheduling in-flight requests directed to
Mistral. This guide will provide detailed instructions on how to use the
Aperture SDK when interfacing with Mistral, and define a concurrency scheduling
policy using Aperture Cloud
Aperture handles peak loads and preserves the user experience with the
Concurrency Scheduling feature by efficiently scheduling in-flight requests
directed to Mistral. This guide will provide detailed instructions on how to use
the Aperture SDK when interfacing with Mistral, and define a concurrency
scheduling policy using Aperture Cloud.

## Schedule Requests in Mistral with Aperture

Aperture can help scheudle in-flight requests and improve user experience by
Aperture can help schedule in-flight requests and improve user experience by
queuing and prioritizing requests before sending them to Mistral. Aperture
offers a blueprint for
[concurrency scheduling](https://docs.fluxninja.com/reference/blueprints/concurrency-scheduling/base),
consisting of two main components:

- Concurrency Limiter: It allows to set the max number of concurrenct requests
that can be processed. This paratemeter can be set according to the an
- Concurrency Limiter: It allows setting the max number of concurrenct requests

Check failure on line 51 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L51

[Vale.Spelling] Did you really mean 'concurrenct'?
Raw output
{"message": "[Vale.Spelling] Did you really mean 'concurrenct'?", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 51, "column": 60}}}, "severity": "ERROR"}

Check warning on line 51 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L51

[RedHat.Spelling] Use correct American English spelling. Did you really mean 'concurrenct'?
Raw output
{"message": "[RedHat.Spelling] Use correct American English spelling. Did you really mean 'concurrenct'?", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 51, "column": 60}}}, "severity": "WARNING"}
that can be processed. This paratemeter can be set according to an

Check failure on line 52 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L52

[Vale.Spelling] Did you really mean 'paratemeter'?
Raw output
{"message": "[Vale.Spelling] Did you really mean 'paratemeter'?", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 52, "column": 31}}}, "severity": "ERROR"}

Check warning on line 52 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L52

[RedHat.Spelling] Use correct American English spelling. Did you really mean 'paratemeter'?
Raw output
{"message": "[RedHat.Spelling] Use correct American English spelling. Did you really mean 'paratemeter'?", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 52, "column": 31}}}, "severity": "WARNING"}
application's ability to set to handle the maximum number of concurrent
requests at a given time.
- Scheduler: Aperture has a
Expand All @@ -73,7 +73,7 @@ Requests coming into the system are categorized into different workloads, each
of which is defined by its priority and weight. This classification is crucial
for the request scheduling process.

The scheduler priorities request admission based the priority and weight
The scheduler priorities request admission based on the priority and weight
assigned to the corresponding workload. This mechanism ensures that
high-priority requests are handled appropriately even under high load.

Expand Down Expand Up @@ -136,7 +136,7 @@ milliseconds, for example, indicates that the request can be queued for a
maximum of 2 minutes. After this interval, the request will be rejected.

Once the `startFlow` call is made, we send the prompt to Mistral and await for
it's response. Excess requests are automatically queued by Aperture, eliminating
its response. Excess requests are automatically queued by Aperture, eliminating
the need to check if a flow `shouldRun` or not.

```mdx-code-block
Expand Down Expand Up @@ -194,17 +194,17 @@ visibility for each flow.
```

Navigate to the `Policies` tab on the sidebar menu, and select `Create Policy`
in the upper right corner. Next, choose the Rate Limiting blueprint and complete
the form with these specific values:
in the upper right corner. Next, choose the Rate Limiting blueprint, select
Concurrency and complete the form with these specific values:

1. `Policy name`: Unique for each policy, this field can be used to define
policies tailored for different use cases. Set the policy name to
`concurrency-scheduling-test`.
2. `Limit by label key`: Determines the specific label key used for concurrency
limits. This paratemeter becomes essential for more granular concurrency

Check failure on line 204 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L204

[Vale.Spelling] Did you really mean 'paratemeter'?
Raw output
{"message": "[Vale.Spelling] Did you really mean 'paratemeter'?", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 204, "column": 17}}}, "severity": "ERROR"}

Check warning on line 204 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L204

[RedHat.Spelling] Use correct American English spelling. Did you really mean 'paratemeter'?
Raw output
{"message": "[RedHat.Spelling] Use correct American English spelling. Did you really mean 'paratemeter'?", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 204, "column": 17}}}, "severity": "WARNING"}
limiting use cases such as per user limiting where a parameter like the

Check failure on line 205 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L205

[RedHat.TermsErrors] Use 'according to', 'as', or 'as in' rather than 'as per'.
Raw output
{"message": "[RedHat.TermsErrors] Use 'according to', 'as', or 'as in' rather than 'as per'.", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 205, "column": 28}}}, "severity": "ERROR"}

Check warning on line 205 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L205

[RedHat.TermsWarnings] Consider using 'such as' rather than 'like' unless updating existing content that uses the term.
Raw output
{"message": "[RedHat.TermsWarnings] Consider using 'such as' rather than 'like' unless updating existing content that uses the term.", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 205, "column": 67}}}, "severity": "WARNING"}
`user_id` can be passed. For now, since we want to do a global concurrency
limiting, we will leave the label as it is.
`user_id` can be passed. For now, we will test global concurrency limiting,
we will leave the label as it is.
3. `Max inflight duration`: Configures the time duration after which flow is
assumed to have ended in case the end call gets missed. We'll set it to `60s`
as an example.
Expand All @@ -219,6 +219,11 @@ the form with these specific values:
8. `Control point`: It can be a particular feature or execution block within a
service. We'll use `mistral-prompt` as an example.

![Concurrency Scheduling Policy](./assets/mistral/mistral-policy.png)

Once you've completed these fields, click `Continue` and then `Apply Policy` to
finalize the policy setup.

```mdx-code-block
</TabItem>
<TabItem value="aperturectl">
Expand Down Expand Up @@ -289,22 +294,26 @@ and `endFlow` functions. To mimic real-world usage, we generated lists of
prompts for paid and open source users, which were sent concurrently to Mistral.
With around 50 users simultaneously requesting responses from Mistral, we
observed significant latency differences. Without Aperture, the response time
for generative AI workloads spiked up to 10 minutes. In contrast, with
Aperture's concurrency scheduling policy in place, not only was the latency
reduced to as low as 20 seconds, but our paying users also experienced much
faster responses compared to those using the open-source version due to paid
users having a high priority.
for generative AI workloads spiked up to 5 minutes. In contrast, with Aperture's
concurrency scheduling policy in place, not only was the latency reduced to as
low as 50 seconds, but our paying users also experienced much faster responses
compared to those using the open-source version due to paid users having a high

Check warning on line 300 in docs/content/guides/mistral.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/content/guides/mistral.md#L300

[RedHat.Hyphens] Use 'open source' rather than 'open-source'.
Raw output
{"message": "[RedHat.Hyphens] Use 'open source' rather than 'open-source'.", "location": {"path": "docs/content/guides/mistral.md", "range": {"start": {"line": 300, "column": 29}}}, "severity": "WARNING"}
priority.

Here is a comparison of the latencies before and after Aperture.

Before Aperture:

[Before Aperture](./assets/mistral/before-aperture.png)
![Before Aperture](./assets/mistral/before-aperture.png)

After Aperture:

Here is the queueing of requests when the max concurrency is met, and how it
bumps up paid requests up in the queue.
![After Aperture](./assets/mistral/after-aperture.png)

Here is the queueing and prioritization of requests when the max concurrency is
met, and how it bumps up paid requests up in the queue.

![Dashboard](./assets/mistral/mistral-queue.png)

In summary, whether you're operating Mistral as a service or employing its API
for app development, managing the demands of generative AI workloads remains a
Expand Down

0 comments on commit 2043ba9

Please sign in to comment.