-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support slow Start mode in Envoy #13176
Conversation
Planning to use callback mechanism for edf loadbalncer to be aware of which hosts are in slow start mode. |
Signed-off-by: Kateryna Nezdolii <[email protected]>
8516d7e
to
4173b08
Compare
@nezdolik lmk when you want a first pass on this! /wait |
// Configuration for slow start mode. | ||
// [#next-free-field: 3] | ||
message SlowStartConfig { | ||
google.protobuf.UInt32Value slow_start_window = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment to fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for review @htuch, i will fix api+docs once PR is in more mature state.
@@ -508,6 +508,18 @@ message Cluster { | |||
google.protobuf.UInt32Value hash_balance_factor = 2 [(validate.rules).uint32 = {gte: 100}]; | |||
} | |||
|
|||
enum EndpointWarmingPolicy { | |||
NO_WAIT = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment to enum values.
WAIT_FOR_FIRST_PASSING_HC = 1; | ||
} | ||
|
||
// Configuration for slow start mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some Envoy docs for this and link from here? I'd suggest translating the design doc into RST and then cleaning that up a bit for end users.
I'm interested in this, would |
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
@sschepens currently this is the logic for
This may be not he final version, but currently it checks host health flags. If passive HC keeps those flags up to date it should work. |
@mattklein123 @snowp need your initial thoughts on suggested approach for tracking hosts in slow start. The code is still wip and plenty of things will be reworked (eg todos, duplicated code, fix format etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. The shape of this LGTM but I would definitely be interested in hearing from @snowp @antoniovicente @tonya11en if they have other impl ideas. Thank you!
/wait
endpoint_warming_policy; | ||
const uint32_t slow_start_window; | ||
TimeSource& time_source_; | ||
absl::node_hash_set<HostSharedPtr> hosts_in_slow_start_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use a flat_hash_set here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's using absl::btree_set
now
// If all hosts are out of the window, we no longer need to track them and therefore we erase | ||
// tracked hosts set. | ||
if (current_time - latest_host_added_time > slow_start_window_ms) { | ||
hosts_in_slow_start_.erase(hosts_in_slow_start_.begin(), hosts_in_slow_start_.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use clear()
Just realised that storing only time of latest added host will not work, for example in case host is added to the cluster and then immediately removed. It needs to be more complex data structure, that supports ordering by time, querying for latest time and lookups by host. |
Assuming we stick with the high level approach, I think you could probably use |
Signed-off-by: Kateryna Nezdolii <[email protected]>
Applied the latest review comments, fingers crossed that all checks will pass |
Signed-off-by: Kateryna Nezdolii <[email protected]>
@mattklein123 am not sure why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks. Just one small comment and then let's ship!
/wait
// 2021/08/15 17290 40349 add all host map to priority set for fast host | ||
// searching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a merge issue? I don't think this should be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: Kateryna Nezdolii <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, awesome work!
🎉 🎉 🎉 |
🎉 🎉 🎉 It's great. |
Woohoo! |
🎉 🎉 🎉 |
🎉 🎉 🎉 Coooooooool |
Support slow Start mode in Envoy (envoyproxy#13176)
* feat: cherry pick commit 6c56dd8 Support slow Start mode in Envoy (envoyproxy#13176) * fix: fix compile error * fix: run proto_format.sh * fix: include file error fix * fix: remove undefined fields * fix: assign lb_round_robin_config_ in ClusterInfoImpl constructor * fix: change applySlowStartFactor to use static 0.3 * host_weight as new weight because formula way will result too small new weight which cause the edf will wait a long time to choose the slow start endpoint Co-authored-by: jst <[email protected]>
@nezdolik one question on this - I understand that for new deployments, when all the pods are in slow start mode, all of them receive similar amount based on their host weights - so slow start mode is essentially not useful in that case and mostly would make sense if new pods come in HPA case. Is that correct? |
Correct @ramaraochavali |
Thank you |
Signed-off-by: Kateryna Nezdolii [email protected]
Support progressive traffic increase in Envoy, implementation is according to design doc: https://docs.google.com/document/d/1NiG1X0gbfFChjl1aL-EE1hdfYxKErjJ2688wJZaj5a0/edit
Additional Description: Please refer to RFC
Risk Level: Medium
Testing: Done
Docs Changes: Done
Release Notes: Done
Fixes #11050