querier: Avoid global sort for dedup when possible + proposal. #5988

bwplotka · 2022-12-20T11:00:02Z

See proposal in Diff to learn more about this PR.

E2e benchmark looks good (check UI latency):

Without this PR:

With this PR:

bwplotka · 2022-12-20T16:54:53Z

Performing last benchmarks & tests, but otherwise good for review.

GiedriusS · 2022-12-21T07:48:44Z

Tests still seem to fail? 🤔

bwplotka · 2022-12-22T15:10:58Z

Should be fixed now. It's ready for your eyes (:

docs/proposals-accepted/20221129-avoid-global-sort.md

fpetkovski · 2022-12-25T08:38:36Z

pkg/store/proxy_heap.go

 }

+// NewDedupResponseHeap returns a wrapper around ProxyResponseHeap that merged duplicated series messages into one.
+// It also deduplicates identical chunks identified by the same checksum from each series message.
 func NewDedupResponseHeap(h *ProxyResponseHeap) *dedupResponseHeap {


Why does this code need to change? From what I understand we just need to fix the dedup iterator so that it splits overlapping chunks into separate streams. I don't understand what the changed version of dedup heap does now.

There was a bug in aggregating the series into one series with non-series responses in between.

Two big complexity points (Next and At), instead of focusing complexity on one place (Next) and keeping At trivial (less cognitive load).

I found previous code less readable than my version (too ambiguous state, too many if else, complex deferred logic), but I guess it's a biased opinion. 🙈

fpetkovski · 2022-12-25T08:40:43Z

cmd/thanos/receive.go

+						MinTime:                      minTime,
+						MaxTime:                      maxTime,
+						SupportsSharding:             true,
+						SupportsWithoutReplicaLabels: false, // TODO(bwplotka): Add support for efficiency.


Why not implement this for every store in the same PR? It should involve just calling a single function on the external labels?

Yea... "just" and then 1000 lines to fix and have tests (: Next PR!

pkg/dedup/iter.go

fpetkovski

@bwplotka mind resolving conflicts so we can proceed with this?

bwplotka · 2023-01-24T11:34:42Z

Yes, will do this week!

* Proposal * Removed deprecated fields from internal proxy series usage. Signed-off-by: bwplotka <[email protected]>

➜ store git:(pre-sort-auto) ✗ benchstat v1.txt v2.txt name old time/op new time/op delta SortWithoutLabels-12 4.02ms ± 2% 1.06ms ± 5% -73.54% (p=0.016 n=5+4) name old alloc/op new alloc/op delta SortWithoutLabels-12 1.04MB ± 0% 0.00MB ±13% -99.99% (p=0.029 n=4+4) name old allocs/op new allocs/op delta SortWithoutLabels-12 30.0k ± 0% 0.0k ± 0% -99.99% (p=0.000 n=5+4) Signed-off-by: bwplotka <[email protected]>

Signed-off-by: bwplotka <[email protected]>

Co-authored-by: Filip Petkovski <[email protected]> Signed-off-by: Bartlomiej Plotka <[email protected]>

bwplotka · 2023-01-26T19:00:50Z

Done @fpetkovski , restarting (hopefully flaky) e2e test.

fpetkovski · 2023-01-27T12:00:14Z

I restarted the CI job again, and it looks like some test fails consistently.

bwplotka · 2023-01-27T12:54:27Z

Hard to say which test... /home/runner/work/thanos/thanos/test/e2e/query_frontend_test.go:794 +0x13af testing.tRunner(0xc00033bd40, 0x1e93b50) - will have to check.

Not a flake I guess.

11:58:10 sidecar-p1: level=debug name=sidecar-p1 ts=2023-01-27T11:58:10.903476011Z caller=prometheus.go:453 msg="handled ReadRequest_STREAMED_XOR_CHUNKS request." frames=12
level=error ts=2023-01-27T11:58:10.904207706Z msg="function failed. Retrying in next tick" err="unexpected result size, expected 2; result 1: {handler=\"/\"} => 1.1903256995869693 @[1674820118.343]"
11:58:15 querier-q1: level=warn name=querier-q1 ts=2023-01-27T11:58:15.899250614Z caller=proxy_heap.go:545 component=proxy request="min_time:1674819818343 max_time:1674820118343 matchers:<name:\"__name__\" value:\"http_requests_total\" > aggregates:COUNT aggregates:SUM without_replica_labels:\"replica\" " msg="detecting store that does not support without replica label setting. Falling back to eager retrieval with additional sort. Make sure your storeAPI supports it to speed up your queries" store="Addr: query-sharding-sidecar-p1:9091 LabelSets: {prometheus=\"p1\", replica=\"0\"} MinTime: -62167219200000 MaxTime: 9223372036854775807"
11:58:15 sidecar-p1: level=debug name=sidecar-p1 ts=2023-01-27T11:58:15.899998509Z caller=prometheus.go:371 msg="started handling ReadRequest_STREAMED_XOR_CHUNKS streamed read response."
11:58:15 sidecar-p1: level=debug name=sidecar-p1 ts=2023-01-27T11:58:15.900240008Z caller=prometheus.go:453 msg="handled ReadRequest_STREAMED_XOR_CHUNKS request." frames=12
level=error ts=2023-01-27T11:58:15.900860604Z msg="function failed. Retrying in next tick" err="unexpected result size, expected 2; result 1: {handler=\"/\"} => 1.1903256995869693 @[1674820118.343]"

GiedriusS · 2023-01-31T15:31:11Z

@bwplotka do you mind if I would take this over and try to finish this? Forced global sorting is a huge performance bottleneck 😄

GiedriusS · 2023-02-09T16:56:43Z

Trying to fix this since it's so close.

Ensure labels are ordered in each time series. Signed-off-by: Giedrius Statkevičius <[email protected]>

Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS · 2023-02-13T16:53:01Z

Fixed the problem with tests: the root cause is that our test data was invalid - labels were not ordered hence grouping key generation in the PromQL engine was not working correctly. Also, I've merged the newest main into this branch. Ready for merging? @fpetkovski 🤔 😄 🎉 everything should be green now.

Signed-off-by: Giedrius Statkevičius <[email protected]>

fpetkovski

🚀

bwplotka · 2023-02-14T15:06:03Z

Thank you so much for putting this over the line 💪🏽

…s-io#5988) * querier: Avoid global sort for dedup when possible. * Proposal * Removed deprecated fields from internal proxy series usage. Signed-off-by: bwplotka <[email protected]> * Optimized storeWithoutLabels. ➜ store git:(pre-sort-auto) ✗ benchstat v1.txt v2.txt name old time/op new time/op delta SortWithoutLabels-12 4.02ms ± 2% 1.06ms ± 5% -73.54% (p=0.016 n=5+4) name old alloc/op new alloc/op delta SortWithoutLabels-12 1.04MB ± 0% 0.00MB ±13% -99.99% (p=0.029 n=4+4) name old allocs/op new allocs/op delta SortWithoutLabels-12 30.0k ± 0% 0.0k ± 0% -99.99% (p=0.000 n=5+4) Signed-off-by: bwplotka <[email protected]> * Added back dedup with simple hack for tmp use. Signed-off-by: bwplotka <[email protected]> * Heap fix. Signed-off-by: bwplotka <[email protected]> * Dedup is now working on all dimensions. Signed-off-by: bwplotka <[email protected]> * Fixed tests. Signed-off-by: bwplotka <[email protected]> * Fixed tests. Signed-off-by: bwplotka <[email protected]> * Apply suggestions from code review Co-authored-by: Filip Petkovski <[email protected]> Signed-off-by: Bartlomiej Plotka <[email protected]> * test/e2e: fix test Ensure labels are ordered in each time series. Signed-off-by: Giedrius Statkevičius <[email protected]> --------- Signed-off-by: bwplotka <[email protected]> Signed-off-by: Bartlomiej Plotka <[email protected]> Signed-off-by: Giedrius Statkevičius <[email protected]> Co-authored-by: Filip Petkovski <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]>

pull-request-size bot added the size/XXL label Dec 20, 2022

bwplotka force-pushed the pre-sort-auto branch 2 times, most recently from d47710a to df2f146 Compare December 20, 2022 16:50

bwplotka marked this pull request as ready for review December 20, 2022 16:54

bwplotka requested review from fpetkovski and GiedriusS December 20, 2022 16:54

bwplotka force-pushed the pre-sort-auto branch 2 times, most recently from 9c668d5 to ae40cda Compare December 20, 2022 17:13

bwplotka force-pushed the pre-sort-auto branch 2 times, most recently from 27f5d7e to 29cb985 Compare December 22, 2022 15:10

bwplotka force-pushed the pre-sort-auto branch 2 times, most recently from 71622ae to 3df92ac Compare December 23, 2022 13:24

fpetkovski reviewed Dec 25, 2022

View reviewed changes

fpetkovski previously approved these changes Jan 23, 2023

View reviewed changes

bwplotka and others added 8 commits January 26, 2023 18:21

querier: Avoid global sort for dedup when possible.

825cba5

* Proposal * Removed deprecated fields from internal proxy series usage. Signed-off-by: bwplotka <[email protected]>

Added back dedup with simple hack for tmp use.

9428625

Signed-off-by: bwplotka <[email protected]>

Heap fix.

7a97304

Signed-off-by: bwplotka <[email protected]>

Dedup is now working on all dimensions.

8814cac

Signed-off-by: bwplotka <[email protected]>

Fixed tests.

e02b136

Signed-off-by: bwplotka <[email protected]>

Fixed tests.

7ef3d94

Signed-off-by: bwplotka <[email protected]>

Apply suggestions from code review

e6bdf1c

Co-authored-by: Filip Petkovski <[email protected]> Signed-off-by: Bartlomiej Plotka <[email protected]>

bwplotka dismissed fpetkovski’s stale review via e6bdf1c January 26, 2023 18:29

bwplotka force-pushed the pre-sort-auto branch from 060d880 to e6bdf1c Compare January 26, 2023 18:29

GiedriusS added 2 commits February 13, 2023 18:46

test/e2e: fix test

9752d72

Ensure labels are ordered in each time series. Signed-off-by: Giedrius Statkevičius <[email protected]>

Merge remote-tracking branch 'origin/main' into pre-sort-auto

63df764

Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS previously approved these changes Feb 13, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into pre-sort-auto

6f8b199

Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS dismissed their stale review via 6f8b199 February 14, 2023 12:51

fpetkovski approved these changes Feb 14, 2023

View reviewed changes

GiedriusS merged commit 6e28411 into main Feb 14, 2023

GiedriusS deleted the pre-sort-auto branch February 14, 2023 13:18

fpetkovski mentioned this pull request Apr 3, 2023

Deduplication returning deduped and non-deduped results in 0.31.0+ #6257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

querier: Avoid global sort for dedup when possible + proposal. #5988

querier: Avoid global sort for dedup when possible + proposal. #5988

bwplotka commented Dec 20, 2022 •

edited

Loading

bwplotka commented Dec 20, 2022

GiedriusS commented Dec 21, 2022

bwplotka commented Dec 22, 2022

fpetkovski Dec 25, 2022

bwplotka Jan 2, 2023

fpetkovski Dec 25, 2022

bwplotka Jan 2, 2023

fpetkovski left a comment

bwplotka commented Jan 24, 2023

bwplotka commented Jan 26, 2023

fpetkovski commented Jan 27, 2023

bwplotka commented Jan 27, 2023 •

edited

Loading

GiedriusS commented Jan 31, 2023

GiedriusS commented Feb 9, 2023

GiedriusS commented Feb 13, 2023

fpetkovski left a comment

bwplotka commented Feb 14, 2023

querier: Avoid global sort for dedup when possible + proposal. #5988

querier: Avoid global sort for dedup when possible + proposal. #5988

Conversation

bwplotka commented Dec 20, 2022 • edited Loading

bwplotka commented Dec 20, 2022

GiedriusS commented Dec 21, 2022

bwplotka commented Dec 22, 2022

fpetkovski Dec 25, 2022

Choose a reason for hiding this comment

bwplotka Jan 2, 2023

Choose a reason for hiding this comment

fpetkovski Dec 25, 2022

Choose a reason for hiding this comment

bwplotka Jan 2, 2023

Choose a reason for hiding this comment

fpetkovski left a comment

Choose a reason for hiding this comment

bwplotka commented Jan 24, 2023

bwplotka commented Jan 26, 2023

fpetkovski commented Jan 27, 2023

bwplotka commented Jan 27, 2023 • edited Loading

GiedriusS commented Jan 31, 2023

GiedriusS commented Feb 9, 2023

GiedriusS commented Feb 13, 2023

fpetkovski left a comment

Choose a reason for hiding this comment

bwplotka commented Feb 14, 2023

bwplotka commented Dec 20, 2022 •

edited

Loading

bwplotka commented Jan 27, 2023 •

edited

Loading