Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix least request lb not fair #29873

Merged
merged 14 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;
// This configuration allows the built-in LEAST_REQUEST LB policy to be configured via the LB policy
// extension point. See the :ref:`load balancing architecture overview
// <arch_overview_load_balancing_types>` for more information.
// [#next-free-field: 6]
message LeastRequest {
// The number of random healthy hosts from which the host with the fewest active requests will
// be chosen. Defaults to 2 so that we perform two-choice selection if the field is not set.
Comment on lines 27 to 28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR change the behavior of the LB. It's necessary to add some more comment to tell that if the choice_count is larger than or equal to the hosts size, a full scan will be used.

Copy link
Member

@wbpcode wbpcode Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior change should be harmless to end users and maybe make this LB's behavior closer to it's name least request.
So, I think a change log should enough rather than a runtime guard.

But still need a check from @envoyproxy/runtime-guard-changes

Expand Down Expand Up @@ -58,4 +59,9 @@ message LeastRequest {

// Configuration for local zone aware load balancing or locality weighted load balancing.
common.v3.LocalityLbConfig locality_lb_config = 4;

// Configuration for performing full scan on the list of hosts.
// If this configuration is set, when selecting the host a full scan on the list hosts will be
// used to select the one with least requests instead of using random choices.
google.protobuf.BoolValue enable_full_scan = 5;
}
14 changes: 14 additions & 0 deletions changelogs/current.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ behavior_changes:
issue https://github.com/envoyproxy/envoy/issues/29461 for more specific detail and examples.

minor_behavior_changes:
- area: upstream
change: |
Fixed a reported issue (https://github.com/envoyproxy/envoy/issues/11004) that causes the Least
Request load balancer policy to be unfair when the number of hosts are very small, when the number
of hosts is smaller than the choice_count, instead of randomly selection hosts from the list, we
perform a full scan on it to choose the host with least requests.
# *Changes that may cause incompatibilities for some users, but should not for most*
- area: local_rate_limit
change: |
Expand Down Expand Up @@ -56,10 +62,18 @@ removed_config_or_runtime:
runtime flag and legacy code path.

new_features:
- area: upstream
change: |
Added :ref:`enable_full_scan <envoy_v3_api_msg_extensions.load_balancing_policies.least_request.v3.LeastRequest>`
option to the least requested load balancer. If set to true, Envoy will perform a full scan on the list of hosts
instead of using :ref:`choice_count
<envoy_v3_api_msg_extensions.load_balancing_policies.least_request.v3.LeastRequest>`
to select the hosts.
- area: stats
change: |
added :ref:`per_endpoint_stats <envoy_v3_api_field_config.cluster.v3.TrackClusterStats.per_endpoint_stats>` to get some metrics
for each endpoint in a cluster.

- area: jwt
change: |
The jwt filter can now serialize non-primitive custom claims when maping claims to headers.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ same or different weights.
approach is nearly as good as an O(N) full scan). This is also known as P2C (power of two
choices). The P2C load balancer has the property that a host with the highest number of active
requests in the cluster will never receive new requests. It will be allowed to drain until it is
less than or equal to all of the other hosts.
less than or equal to all of the other hosts. The number of hosts chosen can be changed by setting
``choice_count``.

* *all weights not equal*: If two or more hosts in the cluster have different load balancing
weights, the load balancer shifts into a mode where it uses a weighted round robin schedule in
which weights are dynamically adjusted based on the host's request load at the time of selection.
Expand Down
19 changes: 19 additions & 0 deletions source/common/upstream/load_balancer_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1299,6 +1299,25 @@ HostConstSharedPtr LeastRequestLoadBalancer::unweightedHostPick(const HostVector
const HostsSource&) {
HostSharedPtr candidate_host = nullptr;

// Do full scan if it's required explicitly or the number of choices is equal to or larger than
// the hosts size.
if ((hosts_to_use.size() <= choice_count_) || enable_full_scan_) {
for (const auto& sampled_host : hosts_to_use) {
if (candidate_host == nullptr) {
// Make a first choice to start the comparisons.
candidate_host = sampled_host;
continue;
}

const auto candidate_active_rq = candidate_host->stats().rq_active_.value();
const auto sampled_active_rq = sampled_host->stats().rq_active_.value();
if (sampled_active_rq < candidate_active_rq) {
candidate_host = sampled_host;
}
}
return candidate_host;
}

for (uint32_t choice_idx = 0; choice_idx < choice_count_; ++choice_idx) {
const int rand_idx = random_.random() % hosts_to_use.size();
const HostSharedPtr& sampled_host = hosts_to_use[rand_idx];
Expand Down
5 changes: 4 additions & 1 deletion source/common/upstream/load_balancer_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -710,7 +710,9 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
least_request_config.has_active_request_bias()
? absl::optional<Runtime::Double>(
{least_request_config.active_request_bias(), runtime})
: absl::nullopt) {
: absl::nullopt),
enable_full_scan_(
PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config, enable_full_scan, false)) {
initialize();
}

Expand Down Expand Up @@ -746,6 +748,7 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
double active_request_bias_{};

const absl::optional<Runtime::Double> active_request_bias_runtime_;
const bool enable_full_scan_{};
};

/**
Expand Down
101 changes: 83 additions & 18 deletions test/common/upstream/load_balancer_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2787,20 +2787,20 @@ TEST_P(LeastRequestLoadBalancerTest, SingleHost) {

// Host weight is 1.
{
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

// Host weight is 100.
{
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

HostVector empty;
{
hostSet().runCallbacks(empty, empty);
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

Expand All @@ -2823,37 +2823,44 @@ TEST_P(LeastRequestLoadBalancerTest, Normal) {

hostSet().healthy_hosts_[0]->stats().rq_active_.set(1);
hostSet().healthy_hosts_[1]->stats().rq_active_.set(2);
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));

hostSet().healthy_hosts_[0]->stats().rq_active_.set(2);
hostSet().healthy_hosts_[1]->stats().rq_active_.set(1);
EXPECT_CALL(random_, random()).WillOnce(Return(0)).WillOnce(Return(2)).WillOnce(Return(3));
EXPECT_CALL(random_, random()).WillOnce(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[1], lb_.chooseHost(nullptr));
}

TEST_P(LeastRequestLoadBalancerTest, PNC) {
hostSet().healthy_hosts_ = {makeTestHost(info_, "tcp://127.0.0.1:80", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:81", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:82", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:83", simTime())};
makeTestHost(info_, "tcp://127.0.0.1:83", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:84", simTime())};
hostSet().hosts_ = hostSet().healthy_hosts_;
hostSet().runCallbacks({}, {}); // Trigger callbacks. The added/removed lists are not relevant.

hostSet().healthy_hosts_[0]->stats().rq_active_.set(4);
hostSet().healthy_hosts_[1]->stats().rq_active_.set(3);
hostSet().healthy_hosts_[2]->stats().rq_active_.set(2);
hostSet().healthy_hosts_[3]->stats().rq_active_.set(1);
hostSet().healthy_hosts_[4]->stats().rq_active_.set(5);

// Creating various load balancer objects with different choice configs.
envoy::config::cluster::v3::Cluster::LeastRequestLbConfig lr_lb_config;
lr_lb_config.mutable_choice_count()->set_value(2);
LeastRequestLoadBalancer lb_2{priority_set_, nullptr, stats_, runtime_,
random_, common_config_, lr_lb_config, simTime()};
lr_lb_config.mutable_choice_count()->set_value(5);
LeastRequestLoadBalancer lb_5{priority_set_, nullptr, stats_, runtime_,
lr_lb_config.mutable_choice_count()->set_value(3);
LeastRequestLoadBalancer lb_3{priority_set_, nullptr, stats_, runtime_,
random_, common_config_, lr_lb_config, simTime()};
lr_lb_config.mutable_choice_count()->set_value(4);
LeastRequestLoadBalancer lb_4{priority_set_, nullptr, stats_, runtime_,
random_, common_config_, lr_lb_config, simTime()};
lr_lb_config.mutable_choice_count()->set_value(6);
LeastRequestLoadBalancer lb_6{priority_set_, nullptr, stats_, runtime_,
random_, common_config_, lr_lb_config, simTime()};

// Verify correct number of choices.

// 0 choices configured should default to P2C.
Expand All @@ -2864,20 +2871,78 @@ TEST_P(LeastRequestLoadBalancerTest, PNC) {
EXPECT_CALL(random_, random()).Times(3).WillRepeatedly(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_2.chooseHost(nullptr));

// 5 choices configured results in P5C.
EXPECT_CALL(random_, random()).Times(6).WillRepeatedly(Return(0));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_5.chooseHost(nullptr));

// Verify correct host chosen in P5C scenario.
// Verify correct host chosen in P3C scenario.
EXPECT_CALL(random_, random())
.Times(6)
.Times(4)
.WillOnce(Return(0))
.WillOnce(Return(3))
.WillOnce(Return(1))
.WillOnce(Return(2));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_3.chooseHost(nullptr));

// Verify correct host chosen in P4C scenario.
EXPECT_CALL(random_, random())
.Times(5)
.WillOnce(Return(0))
.WillOnce(Return(3))
.WillOnce(Return(2))
.WillOnce(Return(1));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_5.chooseHost(nullptr));
.WillOnce(Return(1))
.WillOnce(Return(1))
.WillOnce(Return(2));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_4.chooseHost(nullptr));

// When the number of hosts is smaller or equal to the number of choices we don't call
// random() since we do a full table scan.
EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_6.chooseHost(nullptr));
}

TEST_P(LeastRequestLoadBalancerTest, FullScan) {
hostSet().healthy_hosts_ = {makeTestHost(info_, "tcp://127.0.0.1:80", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:81", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:82", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:83", simTime()),
makeTestHost(info_, "tcp://127.0.0.1:84", simTime())};
hostSet().hosts_ = hostSet().healthy_hosts_;
hostSet().runCallbacks({}, {}); // Trigger callbacks. The added/removed lists are not relevant.

hostSet().healthy_hosts_[0]->stats().rq_active_.set(4);
hostSet().healthy_hosts_[1]->stats().rq_active_.set(3);
hostSet().healthy_hosts_[2]->stats().rq_active_.set(2);
hostSet().healthy_hosts_[3]->stats().rq_active_.set(1);
hostSet().healthy_hosts_[4]->stats().rq_active_.set(5);

// Creating various load balancer objects with different choice configs.
envoy::extensions::load_balancing_policies::least_request::v3::LeastRequest lr_lb_config;
lr_lb_config.mutable_choice_count()->set_value(2);
// Enable full table scan on hosts
lr_lb_config.mutable_enable_full_scan()->set_value(true);
common_config_.mutable_healthy_panic_threshold()->set_value(0);

LeastRequestLoadBalancer lb_2{priority_set_, nullptr, stats_, runtime_,
random_, 1, lr_lb_config, simTime()};
lr_lb_config.mutable_choice_count()->set_value(3);
LeastRequestLoadBalancer lb_3{priority_set_, nullptr, stats_, runtime_,
random_, 1, lr_lb_config, simTime()};
lr_lb_config.mutable_choice_count()->set_value(4);
LeastRequestLoadBalancer lb_4{priority_set_, nullptr, stats_, runtime_,
random_, 1, lr_lb_config, simTime()};
lr_lb_config.mutable_choice_count()->set_value(6);
LeastRequestLoadBalancer lb_6{priority_set_, nullptr, stats_, runtime_,
random_, 1, lr_lb_config, simTime()};

// random is called only once every time and is not to select the host.

EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_2.chooseHost(nullptr));

EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_3.chooseHost(nullptr));

EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_4.chooseHost(nullptr));

EXPECT_CALL(random_, random()).WillOnce(Return(9999));
EXPECT_EQ(hostSet().healthy_hosts_[3], lb_6.chooseHost(nullptr));
}

TEST_P(LeastRequestLoadBalancerTest, WeightImbalance) {
Expand Down
5 changes: 4 additions & 1 deletion test/integration/http_subset_lb_integration_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,10 @@ class HttpSubsetLbIntegrationTest
}
}

if (is_hash_lb_) {
// The default number of choices for the LEAST_REQUEST policy is 2 hosts, if the number of hosts
// is equal to the number of choices, a full scan happens instead, this means that the same host
// will be chosen.
if (is_hash_lb_ || (GetParam() == envoy::config::cluster::v3::Cluster::LEAST_REQUEST)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an way of checking the number of hosts of the cluster so we can make sure the test has the correct behaviour that would be great, for now I added a comment on why we need this check on the test here.

EXPECT_EQ(hosts.size(), 1) << "Expected a single unique host to be selected for "
<< envoy::config::cluster::v3::Cluster::LbPolicy_Name(GetParam());
} else {
Expand Down