Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dedup-ready label sorting in stores #5796

Closed
wants to merge 1 commit into from

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Oct 17, 2022

This commit adds functionality to all stores to sort series labels in order appropriate for deduplication. This is done by propagating replica labels from the Querier in the Series request. When all stores send a hint that they support sorting labels, we can rely on the k-way merge in the proxy and avoid an expensive global sort.

For backwards compatibility, each Store will send a hint in the SeriesResponse to indicate that it supports this functionality. The hint is sent at the beginning of the stream, and is the first response that each store will send.

The Store Proxy will inspect the hints from each store and detect cases where a store does not support resorting of labels. In that case it will itself send a hint upstream that sorting is not supported.

Fixes #5719
Closes #5742 #5692

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@fpetkovski fpetkovski force-pushed the store-label-sort branch 3 times, most recently from ae79069 to 6c8259f Compare October 17, 2022 13:45
@fpetkovski fpetkovski requested review from bwplotka, GiedriusS and yeya24 and removed request for GiedriusS October 17, 2022 13:45
@fpetkovski fpetkovski force-pushed the store-label-sort branch 2 times, most recently from 118007d to 6c8df84 Compare October 18, 2022 05:49
@GiedriusS
Copy link
Member

Sorry, finally got around to this. I think this should be implemented differently. Ideally, we are able to tell in advance whether resorting is needed. I think that would make the code cleaner. Plus, to achieve full streaming, I suggest removing the temporary buffering stage and passing the proxystore iterator around to other iterators. Here's some untested code that shows my idea:

fpetkovski#3

What do you think about such an idea?

@fpetkovski
Copy link
Contributor Author

fpetkovski commented Oct 18, 2022

The problem is that we cannot know in advance if a store is able to send data sorted for dedup. Since we stream series one by one from store/tsdb.go and store/prometheus.go (receiver, ruler and sidecar), data from these stores will be sorted according to the TSDB specific sort (according to labels.Compare).

Sorting for dedup depends on replica labels, and if a user sets replica labels that are regular series labels in TSDB, these stores will not be able to send series in order required for dedup. So whether a store has this capability or not depends on which labels are set as replica labels.

Also it seems like it's possible to override replica labels on individual requests, which makes it hard to use store info checks to get this information.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for this! Looks great, and as we discussed. However, indeed there is quite a cognitive load and fragile core in the hot path (heap) for checking those hint that is only because of compatibility ):

Indeed it would be nice to check alternatives if we have any. Last time we checked with @fpetkovski checking if series are sorted is NOT cheap, (probably almost as expensive as full sort due to hashing), so it's not ideal either.

Ideally we have gRPC versioning for this.... Store API v1.1

pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/proxy.go Outdated Show resolved Hide resolved
pkg/store/proxy.go Outdated Show resolved Hide resolved
pkg/store/storepb/rpc.proto Outdated Show resolved Hide resolved
@bwplotka
Copy link
Member

Actually I like @GiedriusS idea - we really need to tell WHAT gRPC version it is (if new aspect of GRPC feature is supported)... Adding bool to info is enough indeed, but maybe simply add version of gRPC to info? Something naive like that would help us to determine the feature set of store API 🤔

@bwplotka
Copy link
Member

The problem is that we cannot know in advance if a store is able to send data sorted for dedup. Since we stream series one by one from store/tsdb.go and store/prometheus.go (receiver, ruler and sidecar), data from these stores will be sorted according to the TSDB specific sort (according to labels.Compare).
Sorting for dedup depends on replica labels, and if a user sets replica labels that are regular series labels in TSDB, these stores will not be able to send series in order required for dedup. So whether a store has this capability or not depends on which labels are set as replica labels.

I don't think we have this problem. We want to add guarantee of Store API that if certain field is set data is sorted differently. So we need only simple information - does certain Store API supports this new guarantee or not?

@fpetkovski
Copy link
Contributor Author

@GiedriusS we spoke with @bwplotka in Slack about adding buffering to the sidecar and receiver when replica labels are not external labels, which is the second solution from this issue: #5719.

In this case we will load all series in memory in these components, sort them without taking replica labels into account, and only then send upstream. This will allow us to use the static checks from your suggestion and avoid the complicated response hints.

When replica labels are external labels (which is the most common use case), we won't need to buffer and we can stream everything as we do now. Wdyt?

@fpetkovski fpetkovski force-pushed the store-label-sort branch 3 times, most recently from 64650f4 to 98775eb Compare October 19, 2022 07:46
@fpetkovski
Copy link
Contributor Author

fpetkovski commented Oct 19, 2022

I pushed a new commit that implements this idea. We can add a server proxy that can decide whether to buffer and re-sort or not. This way we keep all stores as they are and avoid various checks that can be error prone and brittle.

@fpetkovski fpetkovski force-pushed the store-label-sort branch 7 times, most recently from 1e7ed58 to 5644be4 Compare October 19, 2022 08:28
pkg/query/query_bench_test.go Outdated Show resolved Hide resolved
})
}

func sortRequired(sortWithoutLabels map[string]struct{}, extLabelsMap map[string]struct{}) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us this would mean forced buffering on StoreAPI nodes because we have replica labels set on Thanos Query that exist on some nodes and on some do not :/ thus, we'd be forced to set those external labels on all nodes, to some kind of fake values or something. I wonder how common is a setup like ours. 🤔

Copy link
Contributor Author

@fpetkovski fpetkovski Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this requires us to use the same replica labels everywhere and somehow force external labels on Store Gateway. I actually thought StoreGW always buffers which is why I hardcoded false for passthrough, but after inspecting the code better I see that it's not the case.

We could add a flag in store-gw to indicate what are the replica labels used in the querier, but that will achieve the same effect as setting external labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In such a case we could check if external labels are present in the series. If not, sortWithoutLabels does not change anything, thus we are safe to pass it forward 🤔 or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we can solve this by looking at external labels on individual blocks. Will try to make this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated store/bucket to sort only if replica labels are not found in at least one block. I think this should over the use case you described.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm on second thought, the replica labels can be removed by compaction. In those cases, they won't be found in external labels and store-gateways will be forced to buffer.

I don't know what would be a good solution for this case. We also might be asking for too much from the system without giving it the necessary information to make good decisions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we say that if no label from sortWithoutLabels is present in extLabelsMap that sort is not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that would be also true if a user wants to dedup by a label that is not external, and is present in series inside TSDB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it now in connection with your comment #5796 (comment), I honestly also wasn't aware of this use case.

pkg/query/querier.go Show resolved Hide resolved
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do benchmarks still show a big improvement?

pkg/query/querier.go Show resolved Hide resolved
pkg/query/query_bench_test.go Outdated Show resolved Hide resolved
@@ -1026,6 +1026,7 @@ func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_Serie
seriesLimiter = s.seriesLimiterFactory(s.metrics.queriesDropped.WithLabelValues("series"))
)

sortedSeriesSrv := newSortedSeriesServer(srv, req.SortWithoutLabelSet(), false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so easy <3


responses []*storepb.SeriesResponse
sortWithoutLabelSet map[string]struct{}
passThrough bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could have better naming than passThrough - perhaps something like reSortOnFlushNeeded?

})
}

func sortRequired(sortWithoutLabels map[string]struct{}, extLabelsMap map[string]struct{}) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In such a case we could check if external labels are present in the series. If not, sortWithoutLabels does not change anything, thus we are safe to pass it forward 🤔 or something along those lines.

@@ -1101,7 +1111,7 @@ func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_Serie
}

mtx.Lock()
res = append(res, part)
res = append(res, newSortedSeriesSet(part, sortWithoutLabelSet))
Copy link
Contributor Author

@fpetkovski fpetkovski Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to send labels at the end here. Basically every time we want to do a merge sort, replica labels need to be at the end of the individual series.

That merge is happening here:

thanos/pkg/store/bucket.go

Lines 1168 to 1170 in 35a3e3d

// NOTE: We "carefully" assume series and chunks are sorted within each SeriesSet. This should be guaranteed by
// blockSeries method. In worst case deduplication logic won't deduplicate correctly, which will be accounted later.
set := storepb.MergeSeriesSets(res...)

@fpetkovski
Copy link
Contributor Author

Here are some benchmarks

Select with 2 replicas

name                                             old time/op    new time/op    delta
QuerySelect/1000000SeriesWith1Samples/select-8      1.10s ± 6%     0.84s ± 2%  -23.52%  (p=0.008 n=5+5)
QuerySelect/100000SeriesWith100Samples/select-8     110ms ± 1%      96ms ± 3%  -12.78%  (p=0.008 n=5+5)
QuerySelect/1SeriesWith10000000Samples/select-8    19.5ms ± 1%    20.3ms ± 1%   +3.99%  (p=0.008 n=5+5)

name                                             old alloc/op   new alloc/op   delta
QuerySelect/1000000SeriesWith1Samples/select-8      818MB ± 0%     743MB ± 0%   -9.20%  (p=0.008 n=5+5)
QuerySelect/100000SeriesWith100Samples/select-8    82.5MB ± 0%    74.4MB ± 0%   -9.88%  (p=0.008 n=5+5)
QuerySelect/1SeriesWith10000000Samples/select-8    18.7MB ± 0%    18.7MB ± 0%   +0.06%  (p=0.008 n=5+5)

name                                             old allocs/op  new allocs/op  delta
QuerySelect/1000000SeriesWith1Samples/select-8      12.5M ± 0%      9.6M ± 0%  -23.15%  (p=0.008 n=5+5)
QuerySelect/100000SeriesWith100Samples/select-8     1.25M ± 0%     0.96M ± 0%  -23.32%  (p=0.008 n=5+5)
QuerySelect/1SeriesWith10000000Samples/select-8      167k ± 0%      167k ± 0%   +0.11%  (p=0.008 n=5+5)

Select with 5 replicas


name                                             old time/op    new time/op    delta
QuerySelect/1000000SeriesWith1Samples/select-8      1.04s ± 6%     0.86s ± 3%  -16.72%  (p=0.008 n=5+5)
QuerySelect/100000SeriesWith100Samples/select-8     105ms ± 2%      97ms ± 1%   -7.52%  (p=0.008 n=5+5)
QuerySelect/1SeriesWith10000000Samples/select-8    19.1ms ± 1%    19.6ms ± 1%   +2.52%  (p=0.008 n=5+5)

name                                             old alloc/op   new alloc/op   delta
QuerySelect/1000000SeriesWith1Samples/select-8      814MB ± 0%     752MB ± 0%   -7.54%  (p=0.008 n=5+5)
QuerySelect/100000SeriesWith100Samples/select-8    82.0MB ± 0%    75.0MB ± 0%   -8.60%  (p=0.008 n=5+5)
QuerySelect/1SeriesWith10000000Samples/select-8    18.7MB ± 0%    18.7MB ± 0%   +0.10%  (p=0.008 n=5+5)

name                                             old allocs/op  new allocs/op  delta
QuerySelect/1000000SeriesWith1Samples/select-8      12.2M ± 0%      9.2M ± 0%  -24.59%  (p=0.008 n=5+5)
QuerySelect/100000SeriesWith100Samples/select-8     1.22M ± 0%     0.92M ± 0%  -24.55%  (p=0.008 n=5+5)
QuerySelect/1SeriesWith10000000Samples/select-8      167k ± 0%      167k ± 0%   +0.18%  (p=0.016 n=5+4)

@fpetkovski fpetkovski force-pushed the store-label-sort branch 2 times, most recently from c6d62db to 419d323 Compare October 20, 2022 08:45
@fpetkovski fpetkovski force-pushed the store-label-sort branch 2 times, most recently from c09f0fa to f224770 Compare October 22, 2022 09:40
bool sends_sorted_series = 4;
// TODO(fpetkovski): Remove in v1.0
bool sends_sorted_series_without_labels = 5;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new field here because sends_sorted_series does not mean the same as sending series sorted without certain labels.

@fpetkovski fpetkovski force-pushed the store-label-sort branch 2 times, most recently from 069f63d to 07bf562 Compare October 22, 2022 09:43
@fpetkovski
Copy link
Contributor Author

One thing I realized is that the whole complexity comes from the fact that users are allowed to dedup by a label that is in TSDB and is not necessarily an external label. I wonder if this is even a valid use case, and how many people need to do something like this. Dropping support for this might break some users, but maybe we can add a flag in the querier query.dedup-label-in-tsdb or something similar which can be propagated to stores to tell them to resort the data.

clientsFunc := func() []store.Client {
clients := make([]store.Client, 0, len(storeAPI))
for _, s := range storeAPI {
clients = append(clients, receive.NewLocalClient(storepb.ServerAsClient(s, 0), storesWithSortedSeries, labelsFunc, timeFunc))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to expand inProcessClient to use in here instead or receive client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between those two? Should we maybe converge to a single implementation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inProcessClient is specifically for testing purposes

})
}

func sortRequired(sortWithoutLabels map[string]struct{}, extLabelsMap map[string]struct{}) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we say that if no label from sortWithoutLabels is present in extLabelsMap that sort is not required?

This commit adds functionality to all stores to sort series labels
in order appropriate for deduplication. This is done by propagating
replica labels from the Querier in the Series request. When all stores
send a hint that they support sorting labels, we can rely on the k-way
merge in the proxy and avoid an expensive global sort.

For backwards compatibility, each Store will send a hint in the
SeriesResponse to indicate that it supports this functionality.
The hint is sent at the beginning of the stream, and is the first
response that each store will send.

The Store Proxy will inspect the hints from each store and detect
cases where a store does not support resorting of labels. In that case
it will itself send a hint upstream that sorting is not supported.

Signed-off-by: Filip Petkovski <[email protected]>
@bwplotka
Copy link
Member

So first of all, we have a bug in the PromQL engine, so had to ensure this PR is correct: #5820

Second is - I think I would be happy to fail or not sort for store API implementation that "label" to sort without is not an external label but actually inside e.g. block. so - for this example:

image

But in either case we have to check actively sorting, so:

  1. If we want to fail, sidecar/receive/store etc has to actively check if label passed in request is part of replica label or it exists in block etc.
  2. If we want to not sort, querier has to do check if receive sorted stream.

Also - if we don't pass any label and there ARE replica labels for e.g. prometheus+sidecar or receive etc - we return unsorted series, right? so if we don't want to buffer/sort on leafs - we can't really guarantee sorted results on Store API? -- so, is you PR works with no replica labels passed on querier?

Anyway, sorry for back-forth, but it's a big thing spanning multiple components and it's very easy to break other people setups.

@bwplotka
Copy link
Member

bwplotka commented Oct 25, 2022

Now I am leaning more into hinting sorting and checking this on querier - sounds like easier way... 🤔

I am preparing for new prompl demo, but will look at this after Wed.

@stale
Copy link

stale bot commented Jan 7, 2023

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 7, 2023
@GiedriusS
Copy link
Member

I think we can close this since an alternative implementation was merged for solving this problem? 😄

@GiedriusS GiedriusS closed this Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve detections for sorting data before dedup
4 participants