-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Weighted Random Selection Queue (WRSQ) load balancing #14597
Comments
@tonya11en is this something that you would like to implement or should I look at implementing it? |
@mattklein123 I'd like to implement it, but will probably be sending a patch out over the weekend. I don't feel strongly about it, so if you can knock it out sooner, go for it and I'll be happy to review. |
@tonya11en go for it. I will assign over to you also and I will assume you are going to work on it for now. |
Can we guard existing behavior with runtime for at least a period of time? I agree with direction here but would like to make sure we move carefully when changing LB behaviors. In separate discussions with @antoniovicente, I think we want to make substantial changes to affinity balancing (i.e. that break hash consistency) to become new LB algorithms so that we can safely rollout and migrate. |
Yes, the plan would be to feature flag for sure.
Sorry can you clarify? |
Let's say we modify ring hash LB and change the hash algorithm, so that a new binary rollout would not be consistent with the hashing of a previous Envoy with ring hash LB. I think the safe thing here to do would be to retain the existing algorithm and allow the fleet to flip to the new hash function once rollout completes. This could be done with a runtime flag, but it might be more convenient to treat it as a new LB algorithm as well. |
Yeah this makes sense. We should talk more about this offline to figure out the best way of handling this. We have done some hash algorithm changes at Lyft and it's pretty painful to roll out. Would be nice to do something better if possible. |
I assume efficiency improvements to hash balancers to involve the introduction of new types. Changing the algorithm used via config would be a big challenge since it would likely break a lot of things during the time the config in production hasn't fully converged to the new state. |
@antoniovicente how would you suggest changing the algorithm (or version of the algorithm)? Is there some way to do this in production that wouldn't involve some period of inconsistency? |
is there wiki page or paper about WRSQ? |
ah wow, nice! Thanks for the pointer. I will read through the PR |
I don't think there's a way to accomplish this without having an user visible rehash event. |
/sub |
WIth #14681 merged, what's the next step to resolving this bug? |
I think to address the issue that motivated this thing, I'll swap out the locality scheduler with the WRSQ scheduler. I'll guard it with a feature flag that can revert back to EDF in case it ends up causing unforeseen problems. Assuming we have good results, I'll look into adding a config option in the common LB config. |
FYI, I'm putting finishing touches on the final PR and will send it out in the next couple of days. update (9/2/2021): |
See #14360 (comment) and tonya11en#1 (comment). This will require #14569 to be merged and #14360 to be resolved.
We should use WRSQ for:
We will still use EDF for WLR host picking or any other future picking in which weights can and do rapidly change.
cc @tonya11en @antoniovicente @htuch @snowp
The text was updated successfully, but these errors were encountered: