-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thanos query frontend sub command #2973
Conversation
1df74b4
to
884a415
Compare
49842f5
to
d944b89
Compare
I haven't had the chance to review this PR in itself but I think since this work will be on-going I think I would prefer to merge a minimum state that works good enough ™️, and then iterate on it. So to be explicit, I am for this, but in a follow up PR. |
33d71b9
to
115499a
Compare
ff6d5a7
to
de35949
Compare
de35949
to
2c21459
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks epic for a start 👍
Let's rebase on master since we merged the upgrade PR, wdyt? (:
Otherwise is good for first iteration 💪
The only thing is that I would try to create config as we have for other caches in store so we are ready for other cache providers (:
cmd/thanos/query-frontend.go
Outdated
BoolVar(&c.cacheResults) | ||
|
||
cmd.Flag("query-range.split-interval", "Split queries by an interval and execute in parallel, 0 disables it."). | ||
Default("24h").DurationVar(&c.splitInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it for now but probably we need expose it better (:
cmd/thanos/query-frontend.go
Outdated
|
||
func (c *responseCacheConfig) registerFlag(cmd *kingpin.CmdClause) { | ||
c.fifoCache.registerFlag(cmd) | ||
cmd.Flag("query-range.response-cache-max-freshness", "Most recent allowed cacheable result, to prevent caching very recent results that might still be in flux."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be overall general config not fifo? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. It is a general cache config so I named it query-range.response-cache-max-freshness
and it is available for all caches.
cmd/thanos/query-frontend.go
Outdated
|
||
// fifoCacheConfig defines configurations for Cortex fifo cache. | ||
type fifoCacheConfig struct { | ||
maxSizeBytes units.Base2Bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think we should have those as cache client now and use our cache flags like in:
Line 54 in 82cca56
indexCacheConfig := extflag.RegisterPathOrContent(cmd, "index-cache.config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/thanos/query-frontend.go
Outdated
m[comp.String()] = func(g *run.Group, logger log.Logger, reg *prometheus.Registry, _ opentracing.Tracer, _ <-chan struct{}, _ bool) error { | ||
|
||
return runQueryFrontend( | ||
g, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this syle - I think it can fit a line (:
cmd/thanos/query-frontend.go
Outdated
conf.queryRangeConfig.respCacheConfig.cacheMaxFreshness, | ||
) | ||
|
||
// TODO(yeya24): support other cache when available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Some build errors got after rebasing master.
Seems the Prometheus dependency needs to be updated on the Cortex side as well, |
That might be true. Here comes cyclic deps (: |
Let's propose a PR on their side cc @pracucci |
I will prepare a pr. |
pkg/api/queryfrontend/v1.go
Outdated
|
||
r.Get("/labels", instr("labels", handleFunc)) | ||
r.Post("/labels", instr("labels", handleFunc)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to add stores
and rules
APIs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check it? (:
2c21459
to
c590906
Compare
7789133
to
3e755e2
Compare
I have updated my pr to support the response cache config file. PTAL when you have time. For the Cortex dependency error, I opened a pr cortexproject/cortex#3000 already. |
eda511a
to
67c5609
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, some suggestions mostly around diffetent paths. The current implementation is super vague what is cached, what is splitted and retries etc. What if some path is not in api.go? (:
Can we make it explicit? For example if no retry, splitting is expected for ALL but query_range, can we just remove all but query_range
from API? maybe leaving query
instant makes sense as slow query log is nice for those (: WDYT?
Beside that it's amazing! 💪 Great job! We are missing docs, but we can add those later (:
@@ -35,6 +35,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel | |||
- [#2865](https://github.com/thanos-io/thanos/pull/2865) ui: Migrate Thanos Ruler UI to React | |||
- [#2964](https://github.com/thanos-io/thanos/pull/2964) Query: Add time range parameters to label APIs. Add `start` and `end` fields to Store API `LabelNamesRequest` and `LabelValuesRequest`. | |||
- [#2996](https://github.com/thanos-io/thanos/pull/2996) Sidecar: Add `reloader_config_apply_errors_total` metric. Add new flags `--reloader.watch-interval`, and `--reloader.retry-interval`. | |||
- [#2973](https://github.com/thanos-io/thanos/pull/2973) Add Thanos Query Frontend component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs, would be nice, maybe in next PR
cmd.Flag("query-range.max-query-parallelism", "Maximum number of queries will be scheduled in parallel by the frontend."). | ||
Default("14").IntVar(&c.maxQueryParallelism) | ||
|
||
cmd.Flag("query-range.response-cache-max-freshness", "Most recent allowed cacheable result, to prevent caching very recent results that might still be in flux."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need better help flag description - in separate PR is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
essentially let's describe why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, we can also help update the flag description in Cortex as well.
pkg/api/queryfrontend/v1.go
Outdated
|
||
r.Get("/labels", instr("labels", handleFunc)) | ||
r.Post("/labels", instr("labels", handleFunc)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached?
pkg/api/queryfrontend/v1.go
Outdated
|
||
r.Get("/labels", instr("labels", handleFunc)) | ||
r.Post("/labels", instr("labels", handleFunc)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check it? (:
// TestRoundTripRetryMiddleware tests the retry middleware. | ||
func TestRoundTripRetryMiddleware(t *testing.T) { | ||
testRequest := &queryrange.PrometheusRequest{ | ||
Path: "/api/v1/query_range", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if path is totally different? not query range? Can we test it?
pkg/queryfrontend/roundtrip_test.go
Outdated
{ | ||
name: "disable split", | ||
req: &queryrange.PrometheusRequest{ | ||
Path: "/api/v1/query_range", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, can we test different paths? just to see what to expect
Also still somehow tests flakes, wonder if it's SWIFT flakiness |
Not passing through. Please see cortexproject/cortex#2742. TBH I am not sure if this is the behavior we want to have. Only query range results are cached. The workflow is:
Umm, I agree we should separate labels for other endpoints as well. Cortex does the same thing |
Seems we got another flaky test https://app.circleci.com/pipelines/github/thanos-io/thanos/2764/workflows/74964958-9ce1-4041-b3b9-4ea659a1764e/jobs/10997 |
In this case let's do following:
|
Hello, I agree that we shouldn't limit the endpoints here and it is better to pass through all other routes to downstream querier to make sure everything works fine (like Grafana). But I am not sure why a separate handler is needed? Can I just reuse the downstream roundtripper implemented in Cortex frontend? https://github.com/cortexproject/cortex/blob/master/pkg/querier/frontend/frontend.go#L118. This works but TBH I don't know whether this is a good pattern or not. I just found it not as easy as I thought to deal with the default route with return func(next http.RoundTripper) http.RoundTripper {
queryRangeTripper := queryrange.NewRoundTripper(next, codec, queryRangeMiddleware...)
return frontend.RoundTripFunc(func(r *http.Request) (*http.Response, error) {
switch r.URL.Path {
case "/api/v1/query":
if r.Method == http.MethodGet || r.Method == http.MethodPost {
queriesCount.WithLabelValues(labelQuery).Inc()
}
case "/api/v1/query_range":
if r.Method == http.MethodGet || r.Method == http.MethodPost {
queriesCount.WithLabelValues(labelQueryRange).Inc()
return queryRangeTripper.RoundTrip(r)
}
default:
}
return next.RoundTrip(r)
})
}, nil |
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
… panic Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
67c5609
to
56caec6
Compare
PR is updated and I added more test cases. PTAL tomorrow |
Happy with whatever makes sense more (: Will look today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing.
I like it. What I am only missing here are some docs (can do in next PR) and tests for different pass through items. Again good for next PR (:
}) | ||
return hf | ||
} | ||
srv.Handle("/", injectf(fe.Handler().ServeHTTP)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
t.Run("same range query, cache hit.", func(t *testing.T) { | ||
// Run the same range query again, the result can be retrieved from cache directly. | ||
rangeQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check that other endpoints are pass through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already tested labelNames
and labelValues
here https://github.com/thanos-io/thanos/blob/master/test/e2e/query_frontend_test.go#L99-L118. I am not sure whether this is what you want or not.
Signed-off-by: Ben Ye [email protected]
Changes
The current code is almost the same as the Cortex frontend. I just removed the sharding middleware because seems that is specific for Cortex.
TODO:
dedup
,max_source_resolution
,partial_response
, etc. These params should be part of the cache key because these affect the query results. We need to implement Thanos codec to decode/encode Thanos query requests. Supporting Thanos codec to parse Thanos specific query parameters yeya24/thanos#159Verification