Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support slow Start mode in Envoy #13176

Merged
merged 75 commits into from
Sep 30, 2021
Merged

Conversation

nezdolik
Copy link
Member

@nezdolik nezdolik commented Sep 18, 2020

Signed-off-by: Kateryna Nezdolii [email protected]

Support progressive traffic increase in Envoy, implementation is according to design doc: https://docs.google.com/document/d/1NiG1X0gbfFChjl1aL-EE1hdfYxKErjJ2688wJZaj5a0/edit

Additional Description: Please refer to RFC
Risk Level: Medium
Testing: Done
Docs Changes: Done
Release Notes: Done
Fixes #11050

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13176 was opened by nezdolik.

see: more, trace.

@nezdolik
Copy link
Member Author

Planning to use callback mechanism for edf loadbalncer to be aware of which hosts are in slow start mode.

Signed-off-by: Kateryna Nezdolii <[email protected]>
@mattklein123
Copy link
Member

@nezdolik lmk when you want a first pass on this!

/wait

// Configuration for slow start mode.
// [#next-free-field: 3]
message SlowStartConfig {
google.protobuf.UInt32Value slow_start_window = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment to fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for review @htuch, i will fix api+docs once PR is in more mature state.

@@ -508,6 +508,18 @@ message Cluster {
google.protobuf.UInt32Value hash_balance_factor = 2 [(validate.rules).uint32 = {gte: 100}];
}

enum EndpointWarmingPolicy {
NO_WAIT = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment to enum values.

WAIT_FOR_FIRST_PASSING_HC = 1;
}

// Configuration for slow start mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some Envoy docs for this and link from here? I'd suggest translating the design doc into RST and then cleaning that up a bit for end users.

@sschepens
Copy link
Contributor

I'm interested in this, would WAIT_FOR_FIRST_PASSING_HC apply when endpoint health is provided via EDS and no active HC is configured?

Kateryna Nezdolii added 2 commits September 29, 2020 11:32
Signed-off-by: Kateryna Nezdolii <[email protected]>
Kateryna Nezdolii added 2 commits October 1, 2020 12:44
Signed-off-by: Kateryna Nezdolii <[email protected]>
@nezdolik
Copy link
Member Author

nezdolik commented Oct 5, 2020

@sschepens currently this is the logic for WAIT_FOR_FIRST_PASSING_HC:

if (host->health() == Upstream::Host::Health::Healthy &&
            !host->healthFlagGet(Host::HealthFlag::PENDING_ACTIVE_HC))

This may be not he final version, but currently it checks host health flags. If passive HC keeps those flags up to date it should work.

@nezdolik
Copy link
Member Author

nezdolik commented Oct 5, 2020

@mattklein123 @snowp need your initial thoughts on suggested approach for tracking hosts in slow start. The code is still wip and plenty of things will be reworked (eg todos, duplicated code, fix format etc).
Upon creation of edf lb base, we install 2 callbacks that recalculate hosts in slow start on each priority update/cluster membership update. To track hosts in slow start, added collection (set of host ptrs) and time of latest added hosts in slow start. Keeping the time of latest added host in slow start allows to do fast check if any hosts are in slow start.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. The shape of this LGTM but I would definitely be interested in hearing from @snowp @antoniovicente @tonya11en if they have other impl ideas. Thank you!

/wait

include/envoy/upstream/host_description.h Outdated Show resolved Hide resolved
endpoint_warming_policy;
const uint32_t slow_start_window;
TimeSource& time_source_;
absl::node_hash_set<HostSharedPtr> hosts_in_slow_start_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use a flat_hash_set here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's using absl::btree_set now

// If all hosts are out of the window, we no longer need to track them and therefore we erase
// tracked hosts set.
if (current_time - latest_host_added_time > slow_start_window_ms) {
hosts_in_slow_start_.erase(hosts_in_slow_start_.begin(), hosts_in_slow_start_.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use clear()

source/common/upstream/load_balancer_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.cc Show resolved Hide resolved
@nezdolik
Copy link
Member Author

nezdolik commented Oct 6, 2020

Just realised that storing only time of latest added host will not work, for example in case host is added to the cluster and then immediately removed. It needs to be more complex data structure, that supports ordering by time, querying for latest time and lookups by host.

@mattklein123
Copy link
Member

Just realised that storing only time of latest added host will not work, for example in case host is added to the cluster and then immediately removed. It needs to be more complex data structure, that supports ordering by time, querying for latest time and lookups by host.

Assuming we stick with the high level approach, I think you could probably use absl::btree_set with a heterogeneous lookup implementation that allows storing a host and sorting on last addition time.

Signed-off-by: Kateryna Nezdolii <[email protected]>
@nezdolik
Copy link
Member Author

Applied the latest review comments, fingers crossed that all checks will pass

Signed-off-by: Kateryna Nezdolii <[email protected]>
@nezdolik
Copy link
Member Author

@mattklein123 am not sure why envoy-presubmit failed (looking at the logs it says operation was cancelled)

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks. Just one small comment and then let's ship!

/wait

Comment on lines 271 to 272
// 2021/08/15 17290 40349 add all host map to priority set for fast host
// searching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a merge issue? I don't think this should be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Kateryna Nezdolii <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, awesome work!

@mattklein123 mattklein123 merged commit 6c56dd8 into envoyproxy:main Sep 30, 2021
@tonya11en
Copy link
Member

🎉 🎉 🎉

@wbpcode
Copy link
Member

wbpcode commented Oct 1, 2021

🎉 🎉 🎉 It's great.

@costimuraru
Copy link

Woohoo!

@du2016
Copy link

du2016 commented Oct 8, 2021

🎉 🎉 🎉

@ZPerling
Copy link

🎉 🎉 🎉 Coooooooool

jiangshantao-dbg pushed a commit to istio-mt/envoy that referenced this pull request Jan 4, 2022
Support slow Start mode in Envoy (envoyproxy#13176)
jiangshantao-dbg added a commit to istio-mt/envoy that referenced this pull request Jan 14, 2022
* feat: cherry pick commit 6c56dd8

Support slow Start mode in Envoy (envoyproxy#13176)

* fix: fix compile error

* fix: run proto_format.sh

* fix: include file error fix

* fix: remove undefined fields

* fix: assign lb_round_robin_config_ in ClusterInfoImpl constructor

* fix: change applySlowStartFactor to use static 0.3 * host_weight as new weight

because formula way will result too small new weight which cause the edf will wait a long time to
choose the slow start endpoint

Co-authored-by: jst <[email protected]>
@ramaraochavali
Copy link
Contributor

@nezdolik one question on this - I understand that for new deployments, when all the pods are in slow start mode, all of them receive similar amount based on their host weights - so slow start mode is essentially not useful in that case and mostly would make sense if new pods come in HPA case. Is that correct?

@nezdolik
Copy link
Member Author

@nezdolik one question on this - I understand that for new deployments, when all the pods are in slow start mode, all of them receive similar amount based on their host weights - so slow start mode is essentially not useful in that case and mostly would make sense if new pods come in HPA case. Is that correct?

Correct @ramaraochavali

@ramaraochavali
Copy link
Contributor

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps Approval required for changes to Envoy's external dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Progressive traffic increase for new Pods (slow start mode)