🛠️ #46 inited the least latency routing #70

roma-glushko · 2024-01-11T20:32:25Z

Adding a new routing strategy to pick the least latency model. Adding simple coverage for some config building logic.

…s over time

codecov · 2024-01-14T10:21:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2957360) 63.87% compared to head (c694cb2) 71.74%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #70      +/-   ##
===========================================
+ Coverage    63.87%   71.74%   +7.86%     
===========================================
  Files           27       30       +3     
  Lines         1182     1313     +131     
===========================================
+ Hits           755      942     +187     
+ Misses         381      317      -64     
- Partials        46       54       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# Conflicts: # pkg/providers/provider.go

…ation deterministic

mkrueger12 · 2024-01-14T19:19:00Z

pkg/routers/routing/least_latency.go

+	s.expireAt = time.Now().Add(*s.model.LatencyUpdateInterval())
+}
+
+// LeastLatencyRouting routes requests to the model that responses the fastest


I wonder how we can normalize this to the token count of the response because that seems to be the bottleneck is token generation takes a long time. Might be as simple as counting the generated tokens in the response and dividing it by the response time.

There are basically two options I see:

use the time to fist byte metric (needs to instrument clients to get that info)

or use the approach you described, then we are essentially calculating token generation velocity of each model.

I need to play with OpenAI API, for example, to see which approach makes sense here.

Made a ticket not to forget about this: #78

#46 inited the least latency routing

abf654e

roma-glushko added the area:resiliency label Jan 11, 2024

roma-glushko self-assigned this Jan 11, 2024

roma-glushko linked an issue Jan 11, 2024 that may be closed by this pull request

Implement Least Latency Load Balancing #46

Closed

roma-glushko added area:routing size:large labels Jan 11, 2024

#46 Added the moving average implementation to keep track of latencie…

b5a8fe6

…s over time

roma-glushko marked this pull request as draft January 13, 2024 18:58

roma-glushko added 3 commits January 13, 2024 22:22

#46 Added latency filed to langModels + protected movingAverage by mutex

47f7a50

#46: Got the least latency routing logic defined with warm up phase

49c97a4

Merge branch 'develop' into 46-routing-least-latency-strategy

158a836

roma-glushko added 11 commits January 14, 2024 14:37

Merge branch 'develop' into 46-routing-least-latency-strategy

298b922

# Conflicts: # pkg/providers/provider.go

#46 Updated API specs

b29bc95

#46 Adjusted the Model interface to expose the Latency method

2343a05

#46: Changed the OpenAPI spec job name

7a6b4ab

#46: Covering LL routing by tests

da5df0e

#46: Covered LL by tests

36b1fad

#46 Defined latency measurement config

dee8bbb

#46 passed new config everywhere

3965f5c

#46: Used update interval

cfbd0be

#46: Covered the main MA case

9273faf

#46: Covered the config by tests

21d2d5e

roma-glushko changed the title ~~#46 inited the least latency routing~~ 🛠️ #46 inited the least latency routing Jan 14, 2024

roma-glushko added 5 commits January 14, 2024 19:31

#46: linting

1028400

#46 Covered the config by simple test

10fa671

Merge branch 'develop' into 46-routing-least-latency-strategy

7eb1ff5

#46: Applied the latest adjustments to the Azure OpenAI client

2e3ee63

#46 Complete the schedule test

1659277

roma-glushko marked this pull request as ready for review January 14, 2024 18:25

roma-glushko added 4 commits January 14, 2024 20:35

#46 checked routing builder

8c8a2dc

#46 Made sure the no healthy model case is covered in the LL routing

f64b718

#46: Hardcoding routing values temporarily to keep OpenAPI spec gener…

4f317de

…ation deterministic

#46 Kept routing strategy as simple string in Swagger for now

c694cb2

roma-glushko requested a review from mkrueger12 January 14, 2024 18:50

mkrueger12 approved these changes Jan 14, 2024

View reviewed changes

roma-glushko mentioned this pull request Jan 14, 2024

[Routing] Find a way to normalize latency #78

Closed

roma-glushko merged commit 6aec59f into develop Jan 14, 2024
7 checks passed

roma-glushko deleted the 46-routing-least-latency-strategy branch January 14, 2024 19:46

roma-glushko mentioned this pull request Jan 21, 2024

📦 Release: v0.0.1-rc.1 #93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🛠️ #46 inited the least latency routing #70

🛠️ #46 inited the least latency routing #70

roma-glushko commented Jan 11, 2024 •

edited

Loading

codecov bot commented Jan 14, 2024 •

edited

Loading

mkrueger12 Jan 14, 2024

roma-glushko Jan 14, 2024

roma-glushko Jan 14, 2024

🛠️ #46 inited the least latency routing #70

🛠️ #46 inited the least latency routing #70

Conversation

roma-glushko commented Jan 11, 2024 • edited Loading

codecov bot commented Jan 14, 2024 • edited Loading

Codecov Report

mkrueger12 Jan 14, 2024

Choose a reason for hiding this comment

roma-glushko Jan 14, 2024

Choose a reason for hiding this comment

roma-glushko Jan 14, 2024

Choose a reason for hiding this comment

roma-glushko commented Jan 11, 2024 •

edited

Loading

codecov bot commented Jan 14, 2024 •

edited

Loading