More granular query performance metrics for Thanos store #4895

moadz · 2021-11-23T18:00:51Z

Is your proposal related to a problem?

This is a follow-up to the discussion by @ianbillett #4674 about measuring query performance in Thanos.

tl;dr: We would like to measure query path latency in a reliable and consistent way to supplement our read path SLIs.

Currently in Red Hat's internal observability service, we're using Avalanche in conjunction with the http_request_duration_seconds on the query_range and query handlers.

This gives us some idea of overall latency for each query request we process, but does not allow us to investigate the impact of 'query shape' and data distribution across our store targets.

Describe the solution you'd like

There's a few reason why measuring the latency on the query and query_range endpoints is a poor indicator of how much 'work' Thanos is doing. Thanos will fanout the 'select' phase of the query to each of its store targets (other stores, sidecars etc.) , streaming the series from each target over the network, before executing the query on the resulting stream. This means that we actually touch far more series over the network than is returned in the final query response. An ideal metric for query performance must include the total number of samples and series retrieved remotely before the query is executed.

Ideally this metric will include:

[Must] measure aggregate number of samples returned during the select phase of a fanned out query
[Must] measure aggregate number of series returned during the select phase of fanned out query
[Could] measure aggregate response size (bytes) of fanned out requests (this is less critical as response size can be approximated from number of series/number of samples)

Mercifully, we already capture these stats in tracing spans from the query path. Unfortunately, traces are sampled and are difficult to query reliably for SLI's. My suggestion is that we add a new histogram, thanos_store_query_duration_seconds that persists the total elapsed time for the query, with labels for series_processed and samples_processed attached to each observation to allow for querying for a particular dimension (e.g. thanos_query_duration_seconds for the 0.99 request duration quantile with > 1,000,000 samples returned). Ideally we would represent this with an N-dimensional histogram, but this is not possible with prometheus.

Given most queries will have highly variable samples/series returned for each query, how do we manage cardinality?

Yes, in fact if we used the unique samples/series I suspect this metric would be totally useless. Instead we should define a set of 5-10 distinct 't-shirt size' buckets for the samples/series dimensions.

e.g. Exponential Scale

Samples: 
s <= 1000 samples, m <=  10_000 samples, l <= 100_000 samples, xl > 1_000_000 samples 

Series: 
s <= 10 series, m <= 100 series, m <= 1000 series, l <= 10_000 series, xl <= 100_000 series

So if a query returns 50,000 samples spanning 10 series in 32s it would be an observation of 32s with the labels sample-size=l, series-size=s.

This would allow us to define query SLI's with respect to a specific number of series/samples and mean response time

e.g. 90% of queries for up to 1,000,000 samples and up to 10 series complete in less than 2s.

How will we determine appropriate buckets for our additional dimensions?

This is my primary concern. Given the countless permutations of node size, data distribution and thanos topologies that might influence the query sizes and response times, it is unlikely any set thresholds will be appropriate for all deployments of Thanos. As a result, the 't-shirt sizes' will have to be configurable (possibly via runtime args) with some sensible defaults to allow users to tweak them. The obvious caveat being that if this metric is recorded, any changes to the bucketing would render previous data corrupted/incomprehensible. I would like guidance here if possible.

Describe alternatives you've considered

Separate histograms bucketed by series/samples instead of response time, this would be sub-optimal as we would not be able to correlate both in the same observation

I'm keen to understand how others have measured query performance (systematically) in the context of Thanos/Prometheus, or the potential issues with introducing a metric of this type.

The text was updated successfully, but these errors were encountered:

bwplotka · 2021-11-24T13:13:56Z

I think it would be nicer if you would propose a PR with proposal to Thanos as flow https://thanos.io/tip/contributing/proposal-process.md/.

This will allow us to start discussion on each separate part of the proposal.

Otherwise, I like it. I wonder only:

Are S/M/L configurable? (probably they should)
Maybe it's better to put literal numbers instead of S/M/L to avoid confusion if S, M, L are dynamic? we could name is as series_le, samples_le (similar to normal bucket "lower or equal").

I don't see any issues if we keep permutation of 6 and maybe make it optional (:

bwplotka · 2021-11-24T13:15:22Z

cc @GiedriusS @yeya24 @wiardvanrij, our SLO expert @metalmatze, @kakkoyun @brancz for more opinions (:

wiardvanrij · 2021-11-24T14:00:45Z

Honestly, your questions are deeper than I'm working with our stack. So I haven't formed an entire opinion on it, other than that I pretty much agree with everything you said.

As for the solution. I remembered a potential new feature that might resolve your implementation problem? https://youtu.be/F72Tk8iaWeA Sparse histograms.

moadz · 2021-11-25T15:26:20Z

Are S/M/L configurable? (probably they should)

Yes. I'm contemplating what this configuration would look like. Perhaps passing in arrays for each of the 'histogram partitions' (i.e. the labels we attach to observations that categorise number of series and number of samples)

// Buckets for labelling query performance metrics with respect to duration of query
--query.telemetry.request-duration-seconds-quantiles = [ 0.1, 0.25, 0.75, 1.25, 1.75, 2.5, 3, 5, 10 ]

// Buckets for labelling query performance metrics with respect to number of samples
--query.telemetry.sample-quantiles = [ 100, 1_000, 10_000, 100_000, 1_000_000 ]

// Buckets for labelling query performance metrics with respect to number of series
--query.telemetry.series-quantiles = [ 10, 100, 1000, 10_000, 100_000 ]

Maybe it's better to put literal numbers instead of S/M/L to avoid confusion if S, M, L are dynamic? we could name is as series_le, samples_le (similar to normal bucket "lower or equal").

Yeah agreed. If these are going to be configurable, it's probably useful to encode the literal number instead of t-shirt sizes to avoid confusion. I guess using labels instead of numbers helps when aggregating across bucket changes when the labels may change if they are reconfigured.

moadz · 2021-12-08T14:37:10Z

Initial proposal here @bwplotka #4932 with some caveats around what the metric will actually capture (specifically just the select phase of a query in thanos/store/proxy which precludes any processing)

bill3tt · 2022-01-06T17:26:05Z

@moadz you were talking about the extent of our ability to know tenant usage of querier which got me thinking - it could be really nice to extend these query path timing metrics to include tenant (or some configurable label) as an extra dimension.

This would increase the cardinality significantly, but I can imagine it being very useful.

moadz · 2022-01-07T09:42:02Z

@ianbillett that would be very useful, but I wonder if it would encourage improper usage of this metric.

Single Responsibility Principle; this metric tracks query performance relative to query shape. If we explicitly want to track usage more closely, we would benefit more from a richer query logger(Add query logging/tracing #1706) or a separate metric as opposed to instrumenting this through an already high cardinality histogram.
Are number of series/samples touched a good proxy for usage? As we don't instrument bytes/bandwidth and there are other factors that contribute to how long a query takes, where the data is being retrieved from, how many concurrent queries are being handled at that particular point. A secondary point would be that we are bucketing this dimensions, so it would be difficult to determine the exact number from this metric.
The metric is already quite high in cardinality, if we were to parameterise on, for example, a set of LabelNames that if present will be attached to the observation alongside the value of the label, it would be very easy for a user to introduce orders of magnitude of performance impact without really thinking about it.

bill3tt · 2022-01-11T16:48:07Z

You make some good points there @moadz - I think the richer query logger is what I was picturing in mind :)

stale · 2022-04-16T03:54:53Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2022-09-21T06:28:56Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

moadz mentioned this issue Dec 8, 2021

New metric for measuring query duration proposal #4932

Merged

2 tasks

moadz mentioned this issue Jan 31, 2022

Proposal: report sample and series statistics to help analyze and monitor query complexity prometheus/prometheus#10181

Closed

stale bot added the stale label Apr 16, 2022

GiedriusS removed the stale label Apr 17, 2022

stale bot added the stale label Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More granular query performance metrics for Thanos store #4895

More granular query performance metrics for Thanos store #4895

moadz commented Nov 23, 2021 •

edited

Loading

bwplotka commented Nov 24, 2021 •

edited

Loading

bwplotka commented Nov 24, 2021 •

edited

Loading

wiardvanrij commented Nov 24, 2021

moadz commented Nov 25, 2021 •

edited

Loading

moadz commented Dec 8, 2021 •

edited

Loading

bill3tt commented Jan 6, 2022

moadz commented Jan 7, 2022

bill3tt commented Jan 11, 2022

stale bot commented Apr 16, 2022

stale bot commented Sep 21, 2022

More granular query performance metrics for Thanos store #4895

More granular query performance metrics for Thanos store #4895

Comments

moadz commented Nov 23, 2021 • edited Loading

Is your proposal related to a problem?

Describe the solution you'd like

Given most queries will have highly variable samples/series returned for each query, how do we manage cardinality?

How will we determine appropriate buckets for our additional dimensions?

Describe alternatives you've considered

bwplotka commented Nov 24, 2021 • edited Loading

bwplotka commented Nov 24, 2021 • edited Loading

wiardvanrij commented Nov 24, 2021

moadz commented Nov 25, 2021 • edited Loading

moadz commented Dec 8, 2021 • edited Loading

bill3tt commented Jan 6, 2022

moadz commented Jan 7, 2022

bill3tt commented Jan 11, 2022

stale bot commented Apr 16, 2022

stale bot commented Sep 21, 2022

moadz commented Nov 23, 2021 •

edited

Loading

bwplotka commented Nov 24, 2021 •

edited

Loading

bwplotka commented Nov 24, 2021 •

edited

Loading

moadz commented Nov 25, 2021 •

edited

Loading

moadz commented Dec 8, 2021 •

edited

Loading