store: label_values: fetch less postings #7814

xBazilio · 2024-10-10T13:53:21Z

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

This change optimizes label_values request to store.

Grafana lets you define template variables. You can define pod variable with query like this: label_values(kube_pod_info{}, pod).

In older versions of grafana this query will lead to series request to store. In newer versions and with particular datasource settings this query will lead to label_values request. End result will be the same, but it considerably differs in data being downloaded from object storage.

Investigations show that in case of label_values thanos store fetches alot of postings. This is due to matcher <labelName> != "" being added in label_values request.

Analyzing debug logs of query stat we can see, that after the change we can download considerably less data from object storage.

This change checks, if matchers contain __name__ matcher. This matcher, if present, lessens the number of data needed to be downloaded from storage. Because matchers are evalueated separately during blockClient.ExpandPostings() call, matcher <labelName> != "" will select all references to series which have the label. In case of popular labels such as pod in example above this will selecet alot of references.

The change won't affect queries like label_values(pod) or label_values({some_label="some_value"}, pod), where __name__ matcher isn't specified.

Verification

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

yeya24

Thanks. Good catch.
Do we need to update the comment on L2148?

// Should never be empty since we added labelName!="" matcher to the list of matchers.

This is not necessarily true now if we don't add the non empty matcher if it has metric name?

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

yeya24

Thanks!

revoke

yeya24 · 2024-10-10T21:28:55Z

pkg/store/bucket.go

@@ -2033,7 +2042,8 @@ func (s *BucketStore) LabelValues(ctx context.Context, req *storepb.LabelValuesR

 		// If we have series matchers and the Label is not an external one, add <labelName> != "" matcher
 		// to only select series that have given label name.
-		if len(reqSeriesMatchersNoExtLabels) > 0 && !b.extLset.Has(req.Label) {
+		// We don't need such matcher if matchers already contain __name__ matcher.


Trying to understand this comment. Why it is specific to the metric name label?
It is just a tradeoff of fetching more postings or series.

From my experience, if __name__ is specified, it means, user knows, that the metric contains requested label. The method will select all series with such __name__ and they 99% will have the requested label.
As for random queries, where there can be results with the __name__ but without specified labels, normally, they should be rare. Users still can make queries like label_values({}, pod). They will work, but will fetch alot of data.
So the point is, if the user knows what they need and specifies __name__, we don't need to save them from fetching some extra series. But we do save them from fetching all references to all kube_state_metrics from object storage for example.
Other popular labels may be service, application, job, instance. If __name__ is specified, it is guaranteed it'll be less data, but all the label values.

Not against this. Just hope we have a better way to cover more labels because we have those information from posting cardinality and series size.

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

CHANGELOG.md

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

yeya24

Thanks!

label_values: fetch less postings

e355316

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

pull-request-size bot added the size/S label Oct 10, 2024

xBazilio added 2 commits October 10, 2024 16:55

CHANGELOG.md

acb29a8

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

added acceptance test

bca1cc9

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

pull-request-size bot added size/M and removed size/S labels Oct 10, 2024

MichaHoffmann previously approved these changes Oct 10, 2024

View reviewed changes

yeya24 reviewed Oct 10, 2024

View reviewed changes

removed redundant comment

e17a096

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

xBazilio dismissed MichaHoffmann’s stale review via e17a096 October 10, 2024 16:27

yeya24 previously approved these changes Oct 10, 2024

View reviewed changes

yeya24 reviewed Oct 10, 2024

View reviewed changes

check if matcher is EQ matcher

f747e39

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

yeya24 reviewed Oct 11, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

6b64e73

Signed-off-by: Vasiliy Rumyantsev <[email protected]>

yeya24 approved these changes Oct 11, 2024

View reviewed changes

MichaHoffmann merged commit 274f95e into thanos-io:main Oct 17, 2024
20 of 21 checks passed

harry671003 mentioned this pull request Oct 30, 2024

Update thanos version to 62038110b1bc cortexproject/cortex#6294

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: label_values: fetch less postings #7814

store: label_values: fetch less postings #7814

xBazilio commented Oct 10, 2024 •

edited

Loading

yeya24 left a comment

yeya24 left a comment

yeya24 Oct 10, 2024

xBazilio Oct 10, 2024

yeya24 Oct 11, 2024

yeya24 left a comment

store: label_values: fetch less postings #7814

store: label_values: fetch less postings #7814

Conversation

xBazilio commented Oct 10, 2024 • edited Loading

Changes

Verification

yeya24 left a comment

Choose a reason for hiding this comment

yeya24 left a comment

Choose a reason for hiding this comment

yeya24 Oct 10, 2024

Choose a reason for hiding this comment

xBazilio Oct 10, 2024

Choose a reason for hiding this comment

yeya24 Oct 11, 2024

Choose a reason for hiding this comment

yeya24 left a comment

Choose a reason for hiding this comment

xBazilio commented Oct 10, 2024 •

edited

Loading