Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix least request lb not fair #29873

Merged
merged 14 commits into from
Oct 26, 2023

Conversation

barroca
Copy link
Contributor

@barroca barroca commented Sep 29, 2023

Commit Message: Fix Least requests LB when doing a random pick so it removes already chosen hosts from the random function to remove the chance of selecting the same host again when dealing with a small amount of hosts. Also, when the number of choices is smaller or equal the number of hosts, use full scan of least used instead.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
This Release changes the default behaviour of Least Request Load Balancer doing a full scan when the number of choices is more than equal the size of hosts and also adds a new option on the envoy::extensions::load_balancing_policies::least_request::v3::LeastRequest configuration to always do a full scan. Allowing a full scan instead of random making choices reduces the chance of selecting a host that doesn't have least requests when the number of hosts is smaller.

Platform Specific Features:
[Optional Runtime guard:]
Fixes #11004
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

barroca and others added 3 commits September 29, 2023 14:11
@repokitteh-read-only
Copy link

Hi @barroca, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #29873 was opened by barroca.

see: more, trace.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #29873 was opened by barroca.

see: more, trace.

@barroca barroca force-pushed the ldamata/fix-least-request-LB-not-fair branch from 0c4f25f to 7711181 Compare September 29, 2023 20:44
@wbpcode
Copy link
Member

wbpcode commented Sep 30, 2023

/assign

Copy link
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. And some comments are added.

And at least a release note is necessary to tell what this PR changed or fixed. You can add a new change log entry in this file. https://github.com/envoyproxy/envoy/blob/main/changelogs/current.yaml

api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.h Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.h Outdated Show resolved Hide resolved
source/common/upstream/load_balancer_impl.h Outdated Show resolved Hide resolved
Comment on lines 2790 to 2803
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

// Host weight is 100.
{
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

HostVector empty;
{
hostSet().runCallbacks(empty, empty);
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why these change is necessary 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense to change to 9999, but the number of calls for random have changed since we will call full scan instead.

Comment on lines 27 to 28
// The number of random healthy hosts from which the host with the fewest active requests will
// be chosen. Defaults to 2 so that we perform two-choice selection if the field is not set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR change the behavior of the LB. It's necessary to add some more comment to tell that if the choice_count is larger than or equal to the hosts size, a full scan will be used.

Copy link
Member

@wbpcode wbpcode Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior change should be harmless to end users and maybe make this LB's behavior closer to it's name least request.
So, I think a change log should enough rather than a runtime guard.

But still need a check from @envoyproxy/runtime-guard-changes

@wbpcode
Copy link
Member

wbpcode commented Oct 12, 2023

/wait

…number of choices is larger than the size.

Signed-off-by: Leonardo da Mata <[email protected]>
@lizan
Copy link
Member

lizan commented Oct 17, 2023

/lgtm api

defer to @wbpcode

@lizan lizan removed their assignment Oct 18, 2023
Comment on lines 697 to 699
: absl::nullopt) {
: absl::nullopt),
enable_full_scan_(
PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config.ref(), enable_full_scan, false)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure how to do this on the code, when removing the field from here I was having problems compiling the load_balancer_impl.* files. I would appreciate help on this.

Just keep this enable_full_scan_ always be false if legacy API is used. Then I think this will resolve the compiling problem.

/wait

Copy link
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM overall. Please check the CI and add a release note, thanks.

@@ -749,6 +752,7 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
double active_request_bias_{};

const absl::optional<Runtime::Double> active_request_bias_runtime_;
const bool enable_full_scan_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool enable_full_scan_{};

@wbpcode
Copy link
Member

wbpcode commented Oct 20, 2023

/wait

Signed-off-by: Leonardo da Mata <[email protected]>
wbpcode
wbpcode previously approved these changes Oct 24, 2023
Copy link
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@barroca
Copy link
Contributor Author

barroca commented Oct 25, 2023

Not sure If I need someone else to approve since "envoyproxy/api-shepherds must approve for any API change" is pending.

@barroca
Copy link
Contributor Author

barroca commented Oct 26, 2023

perhaps @lizan needs to take a look again?

@wbpcode wbpcode enabled auto-merge (squash) October 26, 2023 13:05
Copy link
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@wbpcode wbpcode merged commit 3ea2bc4 into envoyproxy:main Oct 26, 2023
103 of 104 checks passed
@barroca barroca deleted the ldamata/fix-least-request-LB-not-fair branch October 26, 2023 14:01
kyessenov added a commit to kyessenov/envoy that referenced this pull request Nov 9, 2023
…citly forleast request lb (envoyproxy#30794)"

This reverts commit e93e556.

Revert "Fix least request lb not fair (envoyproxy#29873)"

This reverts commit 3ea2bc4.

restore api

Signed-off-by: Kuat Yessenov <[email protected]>

fix merge

Signed-off-by: Kuat Yessenov <[email protected]>
@kyessenov
Copy link
Contributor

kyessenov commented Nov 9, 2023

We are reverting this change since we noticed that there is a significant impact on clusters with a small number of hosts caused by the deterministic loop which makes every client pick the same hosts. I think we need a better understanding of the impact with a load simulation for this change to be re-applied. Please consider adding some independent sampling to the algorithm and we can do a more thorough review again.

@barroca
Copy link
Contributor Author

barroca commented Nov 10, 2023

I understand that changing the behaviour would have impacted people, but It is not clear to me that adding an option to enable full scan would have a bad impact since it is explicitly selecting the host with least requests. Wondering if we can add it back?

Thanks @kyessenov and @wbpcode for fixing. I wasn't aware that I had breaking issues . :)

@wbpcode
Copy link
Member

wbpcode commented Nov 10, 2023

The API will be kept and new implementation is still welcome. And @tonya11en is working a simulated system to ensure we can get a more reasonable implementation in the future.

Thanks for your contribution and so sorry for that we need to revert it. We know it taken you lots of time. :(

Hope we can bring it back soon.

@tonya11en
Copy link
Member

tonya11en commented Nov 10, 2023 via email

@jkirschner-hashicorp
Copy link
Contributor

I noticed the "revert PR" hasn't been merged yet.

Instead of reverting, is it sufficient to instead add a patch that starts the full scan at a random index (as suggested by @ggreenway and @tomwans here)? Or are there additional concerns that need to be addressed regarding full scan mode?

@barroca
Copy link
Contributor Author

barroca commented Dec 2, 2023

Hello, I have this PR open that changes the behaviour for a random index: #31146

kyessenov added a commit that referenced this pull request Dec 9, 2023
Commit Message: Reverts #29873 and #30794

Multiple concerns about the effect of a full scan on LEAST_REQUEST have been raised.
See previous discussions in #11004 and #11006.

Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
jkirschner-hashicorp pushed a commit to jkirschner-hashicorp/envoy that referenced this pull request Dec 21, 2023
* Add new idea for selecting hosts among those not selected yet.

Signed-off-by: Leonardo da Mata <[email protected]>

* Change how we choose full table scan

Signed-off-by: Leonardo da Mata <[email protected]>

* Remove cout

Signed-off-by: Leonardo da Mata <[email protected]>

* Fix Tests for load_balancer_impl_test

Signed-off-by: Leonardo da Mata <[email protected]>

* Fix format and make sure full scan happens only when selected or the number of choices is larger than the size.

Signed-off-by: Leonardo da Mata <[email protected]>

* Enable new option on extesions api only

Signed-off-by: Leonardo da Mata <[email protected]>

* Fix Integration tests.

Signed-off-by: Leonardo da Mata <[email protected]>

* Add release notes for full scan in least request LB.

Signed-off-by: Leonardo da Mata <[email protected]>

* Fix ref for release note.

Signed-off-by: Leonardo da Mata <[email protected]>

* Fix release notes

Signed-off-by: Leonardo da Mata <[email protected]>

* Update release note

Signed-off-by: Leonardo da Mata <[email protected]>

---------

Signed-off-by: Leonardo da Mata <[email protected]>
Signed-off-by: Leonardo da Mata <[email protected]>
Co-authored-by: Leonardo da Mata <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Least requests LB is not fair at edge cases
6 participants