Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LoadBalancing Extensions #15827

Closed

Conversation

cgetzen
Copy link

@cgetzen cgetzen commented Apr 2, 2021

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: Enable LoadBalancing Extensions
Additional Description:
Risk Level: Medium
Testing: TBD
Docs Changes: TBD
Release Notes: TBD
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

This has been pulled out of the Shuffle Sharding LB PR (#15375)

Signed-off-by: Charlie Getzen <[email protected]>
Signed-off-by: Charlie Getzen <[email protected]>
@repokitteh-read-only
Copy link

Hi @cgetzen, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #15827 was opened by cgetzen.

see: more, trace.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #15827 was opened by cgetzen.

see: more, trace.

Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this will fix #5598.

@@ -109,7 +109,6 @@ message Cluster {
// this option or not.
CLUSTER_PROVIDED = 6;

// [#not-implemented-hide:] Use the new :ref:`load_balancing_policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the [#not-implemented-hide:] annotation on the load_balancing_policy field (line 971) and the LoadBalancingPolicy message (line 1036).

public:
~TypedLoadBalancerFactory() override = default;

virtual LoadBalancerPtr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we actually implement this, I think we should spend some time thinking about what the LB policy API will actually look like, since this is an area where we need to consider semantic consistency between gRPC and Envoy, at least in terms of what gets expressed in the xDS API.

There are some significant differences between the existing LB policy APIs in Envoy and gRPC. In particular, Envoy's LB policy is given a set of hosts that are already health-checked, and it picks a host synchronously without any knowledge of what connection(s) may or may not already exist in the individual hosts. In contrast, gRPC's LB policy is given a list of addresses, and it is responsible for creating and managing connections to those addresses; its API allows it to tell the channel to queue a request until the LB policy has a connection capable of sending it on.

CC @pianiststickman @htuch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a useful exercise to try understand how the existing LBs in Envoy coudl be migrated to extensions if we wanted to do this in the future. That would tell us whether this interface is sufficient to be able to express what LB policies might want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile to think about the future of the LB interface, but I would prefer to not over-complicate this PR, as it is addressing a longstanding issue of not being able to extend the existing Envoy definition of what a load balancer is. Given that the configuration for this is opaque, I don't think we need to conflate any future load balancing changes with this PR which is really just allowing an extension point on top of an existing interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I just wouldn't consider #5598 fully resolved until we can at least state how we could migrate the static load balancers across to this new interface. Having looked at the ClusterEntry constructor, I think it might be sufficient here, but probably worth some comments in the PR or issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wouldn't consider #5598 fully resolved until we can at least state how we could migrate the static load balancers across to this new interface. Having looked at the ClusterEntry constructor, I think it might be sufficient here, but probably worth some comments in the PR or issue.

I think there are 2 different issues here:

  1. @markdroth concerns about the future of load balancing. I think this is a very important convo but IMO is out of scope of this change.
  2. @htuch concern about whether this interface should be sufficient to migrate all existing LBs. IMO this is required for this change. Not to actually do the migration but please check all existing LB constructors and make sure we are passing sufficient info into the create function such that they all would work. (Bonus points for actually doing the migration.)

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the @envoyproxy/api-shepherds convo on the future of LB, I left some initial comments on the actual change. As part of this change I would like to see an integration test with a fake/trivial extension LB which demonstrates the extension mechanism end to end. Thank you!

/wait

@@ -1348,6 +1348,31 @@ ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::ClusterEntry(
parent.parent_.random_, cluster->lbConfig());
break;
}
case LoadBalancerType::LoadBalancingPolicyConfig: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This occurs off of the main thread, so this won't work since we need to verify the correctness of the configuration prior to when we get here. The way I would recommend doing this is processing the configuration on the main thread next to the other LBs, and storing a factory that we know will work, and then using that factory in the appropriate process in worker context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is a lb_factory_. I noticed this is used for the consistent LBs that are built on the main thread, I wonder if this pattern somehow can be reused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit (5e8ce8c) attempts to mimic lb_factory_ more closely: getFactory is now pulled in to the main thread. I had a hard time coming up with a scenario where I could trigger a config error in the non-main threads, and I can't find where this verification comes from, so this is not much more than a shot in the dark.

I'll continue investigating this, and any extra pointers may speed me up!

Comment on lines 1360 to 1362
ENVOY_LOG(warn, fmt::format("Didn't find a registered implementation for name: '{}'",
policy.name()));
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we fix the above comment you can use the throwing version of the factory lookup functions.

Comment on lines +1370 to +1373
if (lb_ == nullptr) {
ENVOY_LOG(critical, "Didn't find any registered implementation for load_balancing_policy.");
ASSERT(lb_ != nullptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make this RELEASE_ASSERT as this shouldn't happen.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 16, 2021
@github-actions
Copy link

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this May 24, 2021
pianiststickman added a commit to pianiststickman/envoy that referenced this pull request Jul 19, 2021
Enables LOAD_BALANCING_POLICY_CONFIG enum value in LbPolicy and supports
typed load balancers specified in load_balancing_policy. Continues work
done by Charlie Getzen <[email protected]> in PR envoyproxy#15827.

Custom load balancers specified by load_balancing_policy are created as
implementations of ThreadAwareLoadBalancer. Thread-local load balancers
can be implemented as thread-aware load balancers that contain no logic
at the thread-aware level, i.e. the purpose of the thread-aware LB is
solely to contain the factory used to instantiate the thread-local LBs.
(In the future it might be appropriate to provide a construct that
abstracts away thread-aware aspects of ThreadAwareLoadBalancer for LBs
that don't need to be thread-aware.)

A cluster that uses LOAD_BALANCING_POLICY_CONFIG may not also set a
subset LB configuration. If the load balancer type makes use of
subsetting, it should include a subset configuration in its own
configuration message. Future work on load balancing extensions should
include moving the subset LB to use load balancing extensions.

Similarly, a cluster that uses LOAD_BALANCING_POLICY_CONFIG may not set
the CommonLbConfig, and it is not passed into load balancer creation
(mostly owing to its dubious applicability as a top level configuration
message to hierarchical load balancing policy). If the load balancer type
makes use of the CommonLbConfig, it should include a CommonLbConfig in
the configuration message for the load balancing policy.

Considerations for migration of existing load balancers:

- pieces of the ThreadAwareLoadBalancerBase implementation are specific
to the built-in hashing load balancers and should be moved into a base
class specifically for hashing load balancers. As it stands, custom load
balancing policies are required to implement a createLoadBalancer()
method even if the architecture of the LB policy does not require a
hashing load balancer. I think we would also benefit from disentangling
ThreadAwareLoadBalancerBase from LoadBalancerBase, as the former never
actually does any host picking.

- as we convert existing thread-local load balancers to thread-aware
load balancers, new local LBs will be re-created upon membership
changes. We should provide a mechanism allowing load balancers to
control whether this rebuild should occur, e.g. a callback that calls
create() for thread-aware load balancers by default, which can be
overridden to do nothing for thread-local LBs.

Signed-off-by: Eugene Chan <[email protected]>
lizan pushed a commit that referenced this pull request Aug 14, 2021
Enables `LOAD_BALANCING_POLICY_CONFIG` enum value in `LbPolicy` and supports typed load balancers specified in `load_balancing_policy`. Continues work done by Charlie Getzen <[email protected]> in #15827.

Custom load balancers specified by `load_balancing_policy` are created as implementations of `ThreadAwareLoadBalancer`. Thread-local load balancers can be implemented as thread-aware load balancers that contain no logic at the thread-aware level, i.e. the purpose of the thread-aware LB is solely to contain the factory used to instantiate the thread-local LBs. (In the future it might be appropriate to provide a construct that abstracts away thread-aware aspects of `ThreadAwareLoadBalancer` for LBs that don't need to be thread-aware.)

A cluster that uses `LOAD_BALANCING_POLICY_CONFIG` may not also set a subset LB configuration. If the load balancer type makes use of subsetting, it should include a subset configuration in its own configuration message. Future work on load balancing extensions should include moving the subset LB to use load balancing extensions.

Similarly, a cluster that uses `LOAD_BALANCING_POLICY_CONFIG` may not set the `CommonLbConfig`, and it is not passed into load balancer creation (mostly owing to its dubious applicability as a top level configuration message to hierarchical load balancing policy). If the load balancer type makes use of the `CommonLbConfig`, it should include a `CommonLbConfig` in the configuration message for the load balancing policy.

Considerations for migration of existing load balancers:

- pieces of the `ThreadAwareLoadBalancerBase` implementation are specific to the built-in hashing load balancers and should be moved into a base class specifically for hashing load balancers. As it stands, custom load balancing policies are required to implement a `createLoadBalancer()` method even if the architecture of the LB policy does not require a hashing load balancer. I think we would also benefit from disentangling `ThreadAwareLoadBalancerBase` from `LoadBalancerBase`, as the former never actually does any host picking.

- as we convert existing thread-local load balancers to thread-aware load balancers, new local LBs will be re-created upon membership changes. We should provide a mechanism allowing load balancers to control whether this rebuild should occur, e.g. a callback that calls `create()` for thread-aware LBs by default, which can be overridden to do nothing for thread-local LBs.

Risk Level: low
Testing: brought up a cluster with a custom load balancer specified by `load_balancing_policy`; new unit tests included
Docs Changes: n/a
Release Notes: Enable load balancing policy extensions
Platform Specific Features: n/a
Fixes #5598

Signed-off-by: Eugene Chan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants