-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: parallize query expression execution and sharding #5748
Comments
Hm, I am missing the benefit of splitting those two into separate queries. It feels as if we can shard the same no matter if it's split or not. On top of that I think it will be more efficient (and faster) to do 2c: Generally, I see your point, but I feel doing optimization and caching on query-frontend is not ideal as it treats promql as close box. Instead, I would lean towards an architecture where there is NO query frontend, and all the frontend logic is doable in Querier and its PromQL engine. Then Querier can offload work to other queries or some queue. It is capable of caching and distributing the work among each other. Think about this as receivers peers and querier peers. Less components and more efficient solution. cc @fpetkovski |
Overall I think they are all very good points, thanks for the feedback!
We can also do
It would be ideal to have that kind of distributed query engine. |
I mean, don't we already do quite some of this front-end logic in query frontend? If we could eventually move this to architecture you're suggesting, that would be beneficial, but I guess until we'll able to do this on the Querier / PromQL level, Query Frontend is the second best option. |
My take on this is that if we are going to add additional computation to query frontend in the form of a PromQL engine, it might be better to work towards a general solution in QFE, similar to what Mimir does. But in general, I tend to also lean in the direction of doubling down on the new engine and adding functionality there. I think it is now much easier to extend the engine with distributed computation, than to add additional logic to Query Frontend. It should also be more easily testable and maintainable. I agree that caching is still missing, and maybe even out of scope for an engine component, but I also don't have a good feeling on how useful it is to cache a single subtree. |
I will close this one as I feel not worth it for now. |
Is your proposal related to a problem?
#5342 This pr introduces the feature of sharding
by
clause. However, this requires grouping and limits the type of queries we can support to shard.For a non shardable query like
http_requests_total{code="400"} / http_requests_total
, it is still executed by a single querier. The promql query engine analyzes the query, executes twoSelect
call for{__name__="http_requests_total", code="400"}
and{__name__="http_requests_total"}
separately, and finally performs the binary operator calculation.The issue is the two
Select
calls are still executed in the same querier.Describe the solution you'd like
At the query frontend side, it can also analyze the query and splits the original query to 2 small queries
http_requests_total{code="400"}
andhttp_requests_total
, and let the downstream queries execute the 2 queries separately. Query frontend performs the final calculation and returns the result. This is very similar to make query frontend as the query engine to perform the query analysis, results merge, while the work ofSelect
are distributed to different querier instances.Since we use the query frontend to split and generate jobs for each downstream querier, we can also make each expression more
parallelized with sharding. For example, the query
http_requests_total{code="400"} / http_requests_total
can be sharded into 8 small queries and executed in parallel if each expression shard number is 4.Describe alternatives you've considered
I think this is something also doable at the new experimental engine https://github.com/thanos-community/promql-engine. But having the feature there will lose the ability to cache at the query frontend.
Additional context
(Write your answer here.)
The text was updated successfully, but these errors were encountered: