-
Notifications
You must be signed in to change notification settings - Fork 291
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
experiment DS with different context window (#5159)
## Context 1. Adding the context window experiment with DeepSeek model. 2. The primary goal is to experiment with different context length and observe the change in the following metics: `CAR`, `latency`, `# Num Suggestions` and breakdown of latency at various stages such as `Client -> CG`, `CG -> Fireworks` to identify the bottleneck during inference when context window is increased. 3. Default context window of DS model is `128k`, offline experiment shows no issues with increasing context window upto `32k`. 4. Latency change Offline metrics show (The testing is done from a GCP VM ): - with increase from `2048` to `4096`: 100ms - with increase from `2048` to `8196`: 250ms - with increase from `2048` to `16k`: 450ms - with increase from `2048` to `32k`: 650ms 5. Initially run the experiment with 5% traffic each variant and scale to higher traffic if the metrics look stable and within acceptable range. ## Test plan 1. Add the user in the experiment override flag. 2. Manually check that variant uses the correct context window and uses deepseek-v2 model.
- Loading branch information
1 parent
43b6a25
commit 3a0ca9c
Showing
3 changed files
with
49 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters