-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable LoadBalancing Extensions #15827
Enable LoadBalancing Extensions #15827
Conversation
Signed-off-by: Charlie Getzen <[email protected]>
Signed-off-by: Charlie Getzen <[email protected]>
Hi @cgetzen, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this will fix #5598.
@@ -109,7 +109,6 @@ message Cluster { | |||
// this option or not. | |||
CLUSTER_PROVIDED = 6; | |||
|
|||
// [#not-implemented-hide:] Use the new :ref:`load_balancing_policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also remove the [#not-implemented-hide:]
annotation on the load_balancing_policy
field (line 971) and the LoadBalancingPolicy
message (line 1036).
public: | ||
~TypedLoadBalancerFactory() override = default; | ||
|
||
virtual LoadBalancerPtr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we actually implement this, I think we should spend some time thinking about what the LB policy API will actually look like, since this is an area where we need to consider semantic consistency between gRPC and Envoy, at least in terms of what gets expressed in the xDS API.
There are some significant differences between the existing LB policy APIs in Envoy and gRPC. In particular, Envoy's LB policy is given a set of hosts that are already health-checked, and it picks a host synchronously without any knowledge of what connection(s) may or may not already exist in the individual hosts. In contrast, gRPC's LB policy is given a list of addresses, and it is responsible for creating and managing connections to those addresses; its API allows it to tell the channel to queue a request until the LB policy has a connection capable of sending it on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a useful exercise to try understand how the existing LBs in Envoy coudl be migrated to extensions if we wanted to do this in the future. That would tell us whether this interface is sufficient to be able to express what LB policies might want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worthwhile to think about the future of the LB interface, but I would prefer to not over-complicate this PR, as it is addressing a longstanding issue of not being able to extend the existing Envoy definition of what a load balancer is. Given that the configuration for this is opaque, I don't think we need to conflate any future load balancing changes with this PR which is really just allowing an extension point on top of an existing interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. I just wouldn't consider #5598 fully resolved until we can at least state how we could migrate the static load balancers across to this new interface. Having looked at the ClusterEntry
constructor, I think it might be sufficient here, but probably worth some comments in the PR or issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wouldn't consider #5598 fully resolved until we can at least state how we could migrate the static load balancers across to this new interface. Having looked at the ClusterEntry constructor, I think it might be sufficient here, but probably worth some comments in the PR or issue.
I think there are 2 different issues here:
- @markdroth concerns about the future of load balancing. I think this is a very important convo but IMO is out of scope of this change.
- @htuch concern about whether this interface should be sufficient to migrate all existing LBs. IMO this is required for this change. Not to actually do the migration but please check all existing LB constructors and make sure we are passing sufficient info into the create function such that they all would work. (Bonus points for actually doing the migration.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the @envoyproxy/api-shepherds convo on the future of LB, I left some initial comments on the actual change. As part of this change I would like to see an integration test with a fake/trivial extension LB which demonstrates the extension mechanism end to end. Thank you!
/wait
@@ -1348,6 +1348,31 @@ ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::ClusterEntry( | |||
parent.parent_.random_, cluster->lbConfig()); | |||
break; | |||
} | |||
case LoadBalancerType::LoadBalancingPolicyConfig: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This occurs off of the main thread, so this won't work since we need to verify the correctness of the configuration prior to when we get here. The way I would recommend doing this is processing the configuration on the main thread next to the other LBs, and storing a factory that we know will work, and then using that factory in the appropriate process in worker context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is a lb_factory_
. I noticed this is used for the consistent LBs that are built on the main thread, I wonder if this pattern somehow can be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit (5e8ce8c) attempts to mimic lb_factory_ more closely: getFactory is now pulled in to the main thread. I had a hard time coming up with a scenario where I could trigger a config error in the non-main threads, and I can't find where this verification comes from, so this is not much more than a shot in the dark.
I'll continue investigating this, and any extra pointers may speed me up!
ENVOY_LOG(warn, fmt::format("Didn't find a registered implementation for name: '{}'", | ||
policy.name())); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we fix the above comment you can use the throwing version of the factory lookup functions.
if (lb_ == nullptr) { | ||
ENVOY_LOG(critical, "Didn't find any registered implementation for load_balancing_policy."); | ||
ASSERT(lb_ != nullptr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make this RELEASE_ASSERT as this shouldn't happen.
Signed-off-by: Charlie Getzen <[email protected]>
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Enables LOAD_BALANCING_POLICY_CONFIG enum value in LbPolicy and supports typed load balancers specified in load_balancing_policy. Continues work done by Charlie Getzen <[email protected]> in PR envoyproxy#15827. Custom load balancers specified by load_balancing_policy are created as implementations of ThreadAwareLoadBalancer. Thread-local load balancers can be implemented as thread-aware load balancers that contain no logic at the thread-aware level, i.e. the purpose of the thread-aware LB is solely to contain the factory used to instantiate the thread-local LBs. (In the future it might be appropriate to provide a construct that abstracts away thread-aware aspects of ThreadAwareLoadBalancer for LBs that don't need to be thread-aware.) A cluster that uses LOAD_BALANCING_POLICY_CONFIG may not also set a subset LB configuration. If the load balancer type makes use of subsetting, it should include a subset configuration in its own configuration message. Future work on load balancing extensions should include moving the subset LB to use load balancing extensions. Similarly, a cluster that uses LOAD_BALANCING_POLICY_CONFIG may not set the CommonLbConfig, and it is not passed into load balancer creation (mostly owing to its dubious applicability as a top level configuration message to hierarchical load balancing policy). If the load balancer type makes use of the CommonLbConfig, it should include a CommonLbConfig in the configuration message for the load balancing policy. Considerations for migration of existing load balancers: - pieces of the ThreadAwareLoadBalancerBase implementation are specific to the built-in hashing load balancers and should be moved into a base class specifically for hashing load balancers. As it stands, custom load balancing policies are required to implement a createLoadBalancer() method even if the architecture of the LB policy does not require a hashing load balancer. I think we would also benefit from disentangling ThreadAwareLoadBalancerBase from LoadBalancerBase, as the former never actually does any host picking. - as we convert existing thread-local load balancers to thread-aware load balancers, new local LBs will be re-created upon membership changes. We should provide a mechanism allowing load balancers to control whether this rebuild should occur, e.g. a callback that calls create() for thread-aware load balancers by default, which can be overridden to do nothing for thread-local LBs. Signed-off-by: Eugene Chan <[email protected]>
Enables `LOAD_BALANCING_POLICY_CONFIG` enum value in `LbPolicy` and supports typed load balancers specified in `load_balancing_policy`. Continues work done by Charlie Getzen <[email protected]> in #15827. Custom load balancers specified by `load_balancing_policy` are created as implementations of `ThreadAwareLoadBalancer`. Thread-local load balancers can be implemented as thread-aware load balancers that contain no logic at the thread-aware level, i.e. the purpose of the thread-aware LB is solely to contain the factory used to instantiate the thread-local LBs. (In the future it might be appropriate to provide a construct that abstracts away thread-aware aspects of `ThreadAwareLoadBalancer` for LBs that don't need to be thread-aware.) A cluster that uses `LOAD_BALANCING_POLICY_CONFIG` may not also set a subset LB configuration. If the load balancer type makes use of subsetting, it should include a subset configuration in its own configuration message. Future work on load balancing extensions should include moving the subset LB to use load balancing extensions. Similarly, a cluster that uses `LOAD_BALANCING_POLICY_CONFIG` may not set the `CommonLbConfig`, and it is not passed into load balancer creation (mostly owing to its dubious applicability as a top level configuration message to hierarchical load balancing policy). If the load balancer type makes use of the `CommonLbConfig`, it should include a `CommonLbConfig` in the configuration message for the load balancing policy. Considerations for migration of existing load balancers: - pieces of the `ThreadAwareLoadBalancerBase` implementation are specific to the built-in hashing load balancers and should be moved into a base class specifically for hashing load balancers. As it stands, custom load balancing policies are required to implement a `createLoadBalancer()` method even if the architecture of the LB policy does not require a hashing load balancer. I think we would also benefit from disentangling `ThreadAwareLoadBalancerBase` from `LoadBalancerBase`, as the former never actually does any host picking. - as we convert existing thread-local load balancers to thread-aware load balancers, new local LBs will be re-created upon membership changes. We should provide a mechanism allowing load balancers to control whether this rebuild should occur, e.g. a callback that calls `create()` for thread-aware LBs by default, which can be overridden to do nothing for thread-local LBs. Risk Level: low Testing: brought up a cluster with a custom load balancer specified by `load_balancing_policy`; new unit tests included Docs Changes: n/a Release Notes: Enable load balancing policy extensions Platform Specific Features: n/a Fixes #5598 Signed-off-by: Eugene Chan <[email protected]>
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Commit Message: Enable LoadBalancing Extensions
Additional Description:
Risk Level: Medium
Testing: TBD
Docs Changes: TBD
Release Notes: TBD
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]
This has been pulled out of the Shuffle Sharding LB PR (#15375)