-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/gate: Prefix gate metrics for selects #3154
pkg/gate: Prefix gate metrics for selects #3154
Conversation
722181c
to
9b9d35e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contributions. Looking good.
I have a couple of comments.
cmd/thanos/query.go
Outdated
engine = promql.NewEngine( | ||
queryableCreator = query.NewQueryableCreator( | ||
logger, | ||
extprom.WrapRegistererWithPrefix("thanos_query_concurrent_selects_", reg), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, this registry only used for concurrent selects. However, it could be used for others as well. I think we should wrap the registry this early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
fff2500
to
430b3ed
Compare
Thanos query exposed `gate_queries_in_flight` and `gate_duration_seconds_bucket` metrics for concurrent selects. These metrics are now prefixed by `thanos_query_concurrent_selects_`. Signed-off-by: Simon Pasquier <[email protected]>
This change deprecates the gate.(*Keeper) struct. When Keeper is used to create several gates, the metric tracking the number of in-flight metric isn't meaningful because it is hard to say whether requests are being blocked or not. As such the `thanos_query_concurrent_selects_gate_queries_in_flight` is removed. The following metrics have been added to record the maximum number of concurrent requests per gate: * `thanos_query_gate_queries_max` * `thanos_bucket_store_series_gate_queries_max`, previously known as `thanos_bucket_store_queries_concurrent_max.` * `thanos_memcached_getmulti_gate_queries_max` Signed-off-by: Simon Pasquier <[email protected]>
430b3ed
to
0a6b36a
Compare
Signed-off-by: Simon Pasquier <[email protected]>
// gate.(*Keeper).NewGate only once, it is recommended to use gate.New() | ||
// instead. Otherwise it is recommended to use the | ||
// github.com/prometheus/prometheus/pkg/gate package directly and wrap the | ||
// returned gate with gate.InstrumentGateDuration(). | ||
type Keeper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still used anywhere in the codebase? If it's not the case, why not remove it directly? I don't think we have any downstream dependents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately we have, I should have pointed out in the PR.
https://github.com/cortexproject/cortex/blob/f1b95c68d3af4f715b11bd33c53ebaf83260f3bb/pkg/storegateway/bucket_stores.go#L72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and given that Thanos imports Cortex too, it creates a circular dependency hence the need to keep Keeper
:)
Signed-off-by: Simon Pasquier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 Thanks a lot for addressing everything 💜
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Changes
The metrics on the gates for concurrent selects were exposed without a prefix (e.g.
gate_queries_in_flight
andgate_duration_seconds
) unlike the gate metrics for the query API. The first commit 60950ff adds thethanos_query_concurrent_selects_
prefix.After more thinking, I felt that exposing the number of in-flight queries doesn't make much sense for concurrent selects because each query creates a new gate. Thus we can have (max number of concurrent selects per query x max number of concurrent queries) selects in flight and it doesn't tell whether a query is blocked on too many selects or not. I've added another commit
722181c that removes the
thanos_query_concurrent_selects_gate_queries_in_flight
metric.The following metrics have also been added to record the maximum number of concurrent requests per gate:
thanos_query_gate_queries_max
thanos_bucket_store_series_gate_queries_max
, previously known asthanos_bucket_store_queries_concurrent_max.
thanos_memcached_getmulti_gate_queries_max
Verification
Tested manually.