Skip to content

Commit

Permalink
Add Mistral Guide and Assets (#3127)
Browse files Browse the repository at this point in the history
- **Documentation**
- Updated the summary of `mistral.md` to include specific details about
the new functionality.
- Modified section headers in `concurrency-quota-management.md` and
`concurrency-control-and-prioritization.md`.
- Made minor text modifications for clarity and consistency in various
documentation files.
  • Loading branch information
karansohi authored Jan 8, 2024
1 parent 2c0e61d commit 9159bc6
Show file tree
Hide file tree
Showing 17 changed files with 404 additions and 12 deletions.
1 change: 1 addition & 0 deletions .github/styles/Vocab/FluxNinja/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,4 @@ Duolingo
APIKey
[Aa]uthz
exat
productizing
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ Requests coming into the system are categorized into different workloads, each
of which is defined by its priority and weight. This classification is crucial
for the scheduling process within each agent.

Inside every agent, there is a scheduler that priorities request admission based
on two factors: the priority and weight assigned to the corresponding workload,
and the availability of tokens from the global token bucket. This mechanism
ensures that high-priority requests are handled appropriately even under high
load or when the request rate is close to the rate limit.
Inside every agent, there is a scheduler that prioritizes request admission
based on two factors: the priority and weight assigned to the corresponding
workload, and the availability of tokens from the global token bucket. This
mechanism ensures that high-priority requests are handled appropriately even
under high load or when the request rate is close to the rate limit.

## Example Scenario

Expand Down
4 changes: 2 additions & 2 deletions docs/content/guides/api-quota-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Requests coming into the system are categorized into different workloads, each
of which is defined by its priority and weight. This classification is crucial
for the request scheduling process.

The scheduler priorities request admission based on two factors: the priority
The scheduler prioritizes request admission based on two factors: the priority
and weight assigned to the corresponding workload, and the availability of
tokens from the token bucket. This mechanism ensures that high-priority requests
are handled appropriately even under high load or when the request rate is close
Expand Down Expand Up @@ -148,7 +148,7 @@ flow.
```

Navigate to the `Policies` tab on the sidebar menu, and select `Create Policy`
in the upper right corner. Next, choose the Request Prioritization blueprint,
in the upper-right corner. Next, choose the Request Prioritization blueprint,
and from the drop-down options select Quota based. Now, complete the form with
these specific values:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions docs/content/guides/assets/mistral/mistral.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
flowchart LR
classDef Orange fill:#F8773D,stroke:#000000,stroke-width:2px;
classDef Green fill:#56AE89,stroke:#000000,stroke-width:2px;
classDef Red fill:#F13C15,stroke:#000000,stroke-width:1px;
classDef Pink fill:#ffb6c1,stroke:#000000,stroke-width:1px;

TC[\Token Counter/]
class TC Orange

Scheduler
class Scheduler Orange

SDK
class SDK Green

subgraph Aperture_Cloud ["Aperture Cloud"]
Scheduler -- "Counting" --> TC
end
class Aperture_Cloud Green

SDK -- "Schedule Request" --> Scheduler
1 change: 1 addition & 0 deletions docs/content/guides/assets/mistral/mistral.mmd.md5sum
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
5df63f64375615e3ec9761cb92b987af
5 changes: 5 additions & 0 deletions docs/content/guides/assets/mistral/mistral.mmd.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion docs/content/guides/assets/mistral/policy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ spec:
- flow_control:
concurrency_scheduler:
concurrency_limiter:
limit_by_label_key: limit_by_label_key
max_inflight_duration: 60s
in_ports:
max_concurrency:
Expand Down
22 changes: 22 additions & 0 deletions docs/content/guides/assets/mistral/validate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash

set -e

git_root=$(git rev-parse --show-toplevel)

# shellcheck disable=SC1091
source "$git_root"/docs/tools/aperturectl/validate_common.sh

generate_from_values \
values.yaml \
tmp

# copy the generated policy and graph to this (assets) directory so that they can be used in the docs
cp tmp/policies/mistral-concurrency-scheduling-cr.yaml policy.yaml
cp tmp/graphs/mistral-concurrency-scheduling-cr.mmd graph.mmd

# git add the generated policy and graph
"$git_root"/scripts/git_add_safely.sh policy.yaml graph.mmd

# remove the tmp directory
rm -rf tmp
21 changes: 21 additions & 0 deletions docs/content/guides/assets/mistral/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# yaml-language-server: $schema=../../../../../blueprints/concurrency-scheduling/base/gen/definitions.json
blueprint: concurrency-scheduling/base
uri: ../../../../../blueprints
policy:
policy_name: "mistral-concurrency-scheduling"
components: []
concurrency_scheduler:
alerter:
alert_name: "Too many inflight requests"
concurrency_limiter:
max_inflight_duration: "60s"
max_concurrency: 2
scheduler:
priority_label_key: "priority"
tokens_label_key: "tokens"
workload_label_key: "workload"
selectors:
- control_point: "mistral-prompt"
resources:
flow_control:
classifiers: []
4 changes: 2 additions & 2 deletions docs/content/guides/concurrency-control-and-prioritization.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ visibility for each flow.
```

Navigate to the `Policies` tab on the sidebar menu, and select `Create Policy`
in the upper right corner. Next, choose the Request Prioritization blueprint,
in the upper-right corner. Next, choose the Request Prioritization blueprint,
and from the drop-down options select Concurrency based. Now, complete the form
with these specific values:

Expand Down Expand Up @@ -247,7 +247,7 @@ and API Key. In the Aperture Cloud UI, select the Aperture tab from the sidebar
menu. Copy and enter both your Organization address and API Key to establish a
connection between the SDK and Aperture Cloud.

## Monitoring concurrency Scheduling Policy
## Monitoring Concurrency Scheduling Policy

After running the example for a few minutes, you can review the telemetry data
in the Aperture Cloud UI. Navigate to the Aperture Cloud UI, and click the
Expand Down
Loading

0 comments on commit 9159bc6

Please sign in to comment.