-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dedup-ready label sorting in stores #5796
Conversation
ae79069
to
6c8259f
Compare
118007d
to
6c8df84
Compare
Sorry, finally got around to this. I think this should be implemented differently. Ideally, we are able to tell in advance whether resorting is needed. I think that would make the code cleaner. Plus, to achieve full streaming, I suggest removing the temporary buffering stage and passing the proxystore iterator around to other iterators. Here's some untested code that shows my idea: What do you think about such an idea? |
The problem is that we cannot know in advance if a store is able to send data sorted for dedup. Since we stream series one by one from Sorting for dedup depends on replica labels, and if a user sets replica labels that are regular series labels in TSDB, these stores will not be able to send series in order required for dedup. So whether a store has this capability or not depends on which labels are set as replica labels. Also it seems like it's possible to override replica labels on individual requests, which makes it hard to use store info checks to get this information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for this! Looks great, and as we discussed. However, indeed there is quite a cognitive load and fragile core in the hot path (heap) for checking those hint that is only because of compatibility ):
Indeed it would be nice to check alternatives if we have any. Last time we checked with @fpetkovski checking if series are sorted is NOT cheap, (probably almost as expensive as full sort due to hashing), so it's not ideal either.
Ideally we have gRPC versioning for this.... Store API v1.1
Actually I like @GiedriusS idea - we really need to tell WHAT gRPC version it is (if new aspect of GRPC feature is supported)... Adding bool to info is enough indeed, but maybe simply add version of gRPC to info? Something naive like that would help us to determine the feature set of store API 🤔 |
I don't think we have this problem. We want to add guarantee of Store API that if certain field is set data is sorted differently. So we need only simple information - does certain Store API supports this new guarantee or not? |
@GiedriusS we spoke with @bwplotka in Slack about adding buffering to the sidecar and receiver when replica labels are not external labels, which is the second solution from this issue: #5719. In this case we will load all series in memory in these components, sort them without taking replica labels into account, and only then send upstream. This will allow us to use the static checks from your suggestion and avoid the complicated response hints. When replica labels are external labels (which is the most common use case), we won't need to buffer and we can stream everything as we do now. Wdyt? |
64650f4
to
98775eb
Compare
I pushed a new commit that implements this idea. We can add a server proxy that can decide whether to buffer and re-sort or not. This way we keep all stores as they are and avoid various checks that can be error prone and brittle. |
1e7ed58
to
5644be4
Compare
5644be4
to
132365f
Compare
}) | ||
} | ||
|
||
func sortRequired(sortWithoutLabels map[string]struct{}, extLabelsMap map[string]struct{}) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For us this would mean forced buffering on StoreAPI nodes because we have replica labels set on Thanos Query that exist on some nodes and on some do not :/ thus, we'd be forced to set those external labels on all nodes, to some kind of fake values or something. I wonder how common is a setup like ours. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this requires us to use the same replica labels everywhere and somehow force external labels on Store Gateway. I actually thought StoreGW always buffers which is why I hardcoded false
for passthrough, but after inspecting the code better I see that it's not the case.
We could add a flag in store-gw to indicate what are the replica labels used in the querier, but that will achieve the same effect as setting external labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In such a case we could check if external labels are present in the series. If not, sortWithoutLabels does not change anything, thus we are safe to pass it forward 🤔 or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can solve this by looking at external labels on individual blocks. Will try to make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated store/bucket to sort only if replica labels are not found in at least one block. I think this should over the use case you described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm on second thought, the replica labels can be removed by compaction. In those cases, they won't be found in external labels and store-gateways will be forced to buffer.
I don't know what would be a good solution for this case. We also might be asking for too much from the system without giving it the necessary information to make good decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we say that if no label from sortWithoutLabels
is present in extLabelsMap
that sort is not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately that would be also true if a user wants to dedup by a label that is not external, and is present in series inside TSDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it now in connection with your comment #5796 (comment), I honestly also wasn't aware of this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do benchmarks still show a big improvement?
132365f
to
6da682d
Compare
pkg/store/bucket.go
Outdated
@@ -1026,6 +1026,7 @@ func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_Serie | |||
seriesLimiter = s.seriesLimiterFactory(s.metrics.queriesDropped.WithLabelValues("series")) | |||
) | |||
|
|||
sortedSeriesSrv := newSortedSeriesServer(srv, req.SortWithoutLabelSet(), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so easy <3
pkg/store/sorted_series_server.go
Outdated
|
||
responses []*storepb.SeriesResponse | ||
sortWithoutLabelSet map[string]struct{} | ||
passThrough bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we could have better naming than passThrough
- perhaps something like reSortOnFlushNeeded
?
}) | ||
} | ||
|
||
func sortRequired(sortWithoutLabels map[string]struct{}, extLabelsMap map[string]struct{}) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In such a case we could check if external labels are present in the series. If not, sortWithoutLabels does not change anything, thus we are safe to pass it forward 🤔 or something along those lines.
6da682d
to
f0a63f2
Compare
@@ -1101,7 +1111,7 @@ func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_Serie | |||
} | |||
|
|||
mtx.Lock() | |||
res = append(res, part) | |||
res = append(res, newSortedSeriesSet(part, sortWithoutLabelSet)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to send labels at the end here. Basically every time we want to do a merge sort, replica labels need to be at the end of the individual series.
That merge is happening here:
Lines 1168 to 1170 in 35a3e3d
// NOTE: We "carefully" assume series and chunks are sorted within each SeriesSet. This should be guaranteed by | |
// blockSeries method. In worst case deduplication logic won't deduplicate correctly, which will be accounted later. | |
set := storepb.MergeSeriesSets(res...) |
Here are some benchmarks Select with 2 replicas
Select with 5 replicas
|
c6d62db
to
419d323
Compare
c09f0fa
to
f224770
Compare
bool sends_sorted_series = 4; | ||
// TODO(fpetkovski): Remove in v1.0 | ||
bool sends_sorted_series_without_labels = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new field here because sends_sorted_series
does not mean the same as sending series sorted without certain labels.
069f63d
to
07bf562
Compare
One thing I realized is that the whole complexity comes from the fact that users are allowed to dedup by a label that is in TSDB and is not necessarily an external label. I wonder if this is even a valid use case, and how many people need to do something like this. Dropping support for this might break some users, but maybe we can add a flag in the querier |
clientsFunc := func() []store.Client { | ||
clients := make([]store.Client, 0, len(storeAPI)) | ||
for _, s := range storeAPI { | ||
clients = append(clients, receive.NewLocalClient(storepb.ServerAsClient(s, 0), storesWithSortedSeries, labelsFunc, timeFunc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to expand inProcessClient
to use in here instead or receive client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between those two? Should we maybe converge to a single implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inProcessClient
is specifically for testing purposes
}) | ||
} | ||
|
||
func sortRequired(sortWithoutLabels map[string]struct{}, extLabelsMap map[string]struct{}) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we say that if no label from sortWithoutLabels
is present in extLabelsMap
that sort is not required?
This commit adds functionality to all stores to sort series labels in order appropriate for deduplication. This is done by propagating replica labels from the Querier in the Series request. When all stores send a hint that they support sorting labels, we can rely on the k-way merge in the proxy and avoid an expensive global sort. For backwards compatibility, each Store will send a hint in the SeriesResponse to indicate that it supports this functionality. The hint is sent at the beginning of the stream, and is the first response that each store will send. The Store Proxy will inspect the hints from each store and detect cases where a store does not support resorting of labels. In that case it will itself send a hint upstream that sorting is not supported. Signed-off-by: Filip Petkovski <[email protected]>
07bf562
to
03409b7
Compare
So first of all, we have a bug in the PromQL engine, so had to ensure this PR is correct: #5820 Second is - I think I would be happy to fail or not sort for store API implementation that "label" to sort without is not an external label but actually inside e.g. block. so - for this example: But in either case we have to check actively sorting, so:
Also - if we don't pass any label and there ARE replica labels for e.g. prometheus+sidecar or receive etc - we return unsorted series, right? so if we don't want to buffer/sort on leafs - we can't really guarantee sorted results on Store API? -- so, is you PR works with no replica labels passed on querier? Anyway, sorry for back-forth, but it's a big thing spanning multiple components and it's very easy to break other people setups. |
Now I am leaning more into hinting sorting and checking this on querier - sounds like easier way... 🤔 I am preparing for new prompl demo, but will look at this after Wed. |
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days. |
I think we can close this since an alternative implementation was merged for solving this problem? 😄 |
This commit adds functionality to all stores to sort series labels in order appropriate for deduplication. This is done by propagating replica labels from the Querier in the Series request. When all stores send a hint that they support sorting labels, we can rely on the k-way merge in the proxy and avoid an expensive global sort.
For backwards compatibility, each Store will send a hint in the SeriesResponse to indicate that it supports this functionality. The hint is sent at the beginning of the stream, and is the first response that each store will send.
The Store Proxy will inspect the hints from each store and detect cases where a store does not support resorting of labels. In that case it will itself send a hint upstream that sorting is not supported.
Fixes #5719
Closes #5742 #5692
Changes
Verification