Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryFrontend: HTTP/gRPC client load balancing #3373

Open
bwplotka opened this issue Oct 28, 2020 · 12 comments
Open

QueryFrontend: HTTP/gRPC client load balancing #3373

bwplotka opened this issue Oct 28, 2020 · 12 comments
Labels
difficulty: medium dont-go-stale Label for important issues which tells the stalebot not to close them feature request/improvement help wanted

Comments

@bwplotka
Copy link
Member

bwplotka commented Oct 28, 2020

It would be amazing to use spitting mechanism better and allow distributing requests to multiple querier replicas.

Most of us use Kubernetes with K8s Serice only which are TCP based, so they don't do round robin balancing - they just pick one. Some of users don't have good lb handy even. I think it should be application logic to make that happen.

I already wrote similar code e.g in Kedge which works on production even now (https://github.com/improbable-eng/kedge/blob/772f9b2d2092a0ada972096945bee8cd49513da6/pkg/kedge/http/lbtransport/transport.go#L104), HOWEVER, it might be a better idea to use grpc rather. This is more tricky though but gives us support for load balancing on Querier towards other APIs as well. So we have two choices (or implement both):

a) http

Pros:

  • We have code for it ready
  • Query endpoints are in HTTP already.

b) grpc

Pros:

  • LB for query frontend against just Prometheus is not needed
  • We need these features for Querier -> Store/Rule/TargerAPIs as well anyway.
  • We can implement richer metadata passing that will allow to loadbalance based on saturation(!)

Cons:

  • There is some sketch of gRPC client LB code in gRPC ecosystem, but this is changing every release, so work has to be done.
  • We need to expose Query APIs with gRPC and switch to gRPC port.

In both cases acceptance criteria are the same:

AC:

  • Query frontend replicas can route to multiple queries and have different load balancing strategies (Round Robin for a start)

I would vote for B (: More ambitious and more benefits. Thoughts? @pracucci @yeya24 @brancz @kakkoyun

@kakkoyun
Copy link
Member

I think we should immediately go for option a just because it's a low hanging fruit. And then we can devise a plan to implement option b.
I'm with you the ideal solution would be the option b.

So let's just make it work and then we can iterate and optimize it.

@brancz
Copy link
Member

brancz commented Oct 31, 2020

I agree with @kakkoyun.

@stale
Copy link

stale bot commented Dec 31, 2020

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Dec 31, 2020
@kakkoyun kakkoyun removed the stale label Jan 4, 2021
@hitanshu-mehta
Copy link
Contributor

I think I can work on this issue with some guidance. Can you please assign me if nobody is working on this?

@hitanshu-mehta
Copy link
Contributor

I have one question regarding first part of AC, i.e.

Query frontend replicas can route to multiple queries

What approach everyone suggests? Should it be similar to cortex? ( which is as far as i understood 😅 , add a flag in querier -querier.frontend-address to connect it with frontend and querier will pull request from the frontend queue. )

@roidelapluie
Copy link

if you go for http, would it be something reusable for prometheus/prometheus#8402 ?

@bwplotka
Copy link
Member Author

bwplotka commented Feb 24, 2021

@roidelapluie Yes 🤗

@stale
Copy link

stale bot commented Jun 3, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 3, 2021
@kakkoyun
Copy link
Member

kakkoyun commented Jun 3, 2021

Still valid.

@stale stale bot removed the stale label Jun 3, 2021
@stale
Copy link

stale bot commented Aug 2, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Aug 2, 2021
@stale
Copy link

stale bot commented Aug 17, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Aug 17, 2021
@hdost
Copy link

hdost commented Aug 25, 2021

Still valid.

@bwplotka bwplotka reopened this Jun 16, 2022
@stale stale bot removed the stale label Jun 16, 2022
@bwplotka bwplotka added the dont-go-stale Label for important issues which tells the stalebot not to close them label Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: medium dont-go-stale Label for important issues which tells the stalebot not to close them feature request/improvement help wanted
Projects
None yet
Development

No branches or pull requests

6 participants