-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: use k-way merging for proxying logic #5296
Conversation
d697c8d
to
360342f
Compare
Use k-way merging for merging multiple responses coming from multiple StoreAPIs. This avoids having a bottleneck of a Go channel that can only hold 10 responses at once. So, now we will Recv() messages as fast as possible. Users should be able to see much quicker queries because network will be used as much as possible. Benchmarks: ``` name old time/op new time/op delta ProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-16 1.01s ± 7% 0.24s ± 2% -76.62% (p=0.008 n=5+5) ProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-16 926ms ± 3% 454ms ± 2% -50.99% (p=0.008 n=5+5) ProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-16 95.6ms ± 1% 23.4ms ± 3% -75.52% (p=0.016 n=4+5) ProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-16 93.1ms ± 2% 45.1ms ± 1% -51.63% (p=0.008 n=5+5) ProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-16 39.5µs ± 1% 25.2µs ± 1% -36.22% (p=0.008 n=5+5) ProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-16 22.6µs ± 0% 15.2µs ± 0% -32.71% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-16 186MB ± 0% 108MB ± 0% -41.87% (p=0.008 n=5+5) ProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-16 186MB ± 0% 107MB ± 0% -42.18% (p=0.008 n=5+5) ProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-16 18.5MB ± 0% 10.0MB ± 0% -45.79% (p=0.008 n=5+5) ProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-16 18.5MB ± 0% 10.6MB ± 0% -42.69% (p=0.008 n=5+5) ProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-16 10.9kB ± 0% 6.1kB ± 0% -44.06% (p=0.008 n=5+5) ProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-16 5.09kB ± 0% 3.06kB ± 0% -39.83% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-16 5.00M ± 0% 1.00M ± 0% -80.00% (p=0.008 n=5+5) ProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-16 5.00M ± 0% 1.00M ± 0% -80.00% (p=0.008 n=5+5) ProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-16 500k ± 0% 100k ± 0% -79.96% (p=0.008 n=5+5) ProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-16 500k ± 0% 100k ± 0% -79.98% (p=0.016 n=5+4) ProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-16 180 ± 0% 125 ± 0% -30.56% (p=0.008 n=5+5) ProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-16 109 ± 0% 80 ± 0% -26.61% (p=0.008 n=5+5) ``` `TestProxyStore_SeriesSlowStores` also passes which tests how everything works with slow stores. Rewritten chunk deduplication logic to use hashing. In the future we could get some more marginal gains by using a selection/tournament tree. Signed-off-by: Giedrius Statkevičius <[email protected]>
360342f
to
0fb4a18
Compare
This is pretty sweet 💪🏽 Sounds like this might fix #4780 - will take a look this week or next (: |
Tried this pull request in prod with a query that returns ~21k series - it reduced the query duration by about 30% 🤯 |
MD5 is fine here because we don't care as much about collisions. Signed-off-by: Giedrius Statkevičius <[email protected]>
I tested this in our infra and can confirm improved performance for queries across lots of series. Some queries went down from 40s to 30s. Dashboards also seem to be loading more smoothly. Looking forward to seeing this merged 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GiedriusS for leading this work, amazing stuff!
I think we could switch to this, but there are two points to remember:
- I think the efficiency boost (e.g latency) is caused not by a different merge algorithm, but actually by buffering all messages ASAP (which we agreed makes sense)
We could trigger an infinite (kind of) buffer by just changing this channel length to something large (e.g million).. which is obv bad, because we would always allocate that many.
My point is: In this PR with buffering all messages change (which is amazing), we ALSO changed to k-merge which might be actually slower/worse than the previous MergeSort. We don't know as we changed two things in the same time here.
- We still have a bottleneck on waiting for ALL messages until we start streaming merged responses. I think we can do k-merge in more asynchronous way. My worry is that if we want to implement it we will have to refactor this code 3rd time 🙃
Maybe those 2 things are fine to agree to. I would be keen on merging this as it's simple and have immediate value, so approving. I added some readability suggestions (nits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we also have to figure out this: #5296 (comment)
This feels like a future problem if we ignore it 🤔
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Amazing. Thanks for making lazy response buffering 👍🏽
Wonder what's the benchmarks are now. Happy to merge!
Please don't merge this yet - even though the code works and the benchmarks look even better (will post tomorrow) but I think there's some lock contention going. So, let me come back with some final tweaks and a bigger comment. |
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Mhm, it seems like it is impossible to implement this properly because gRPC RecvMsg() doesn't take any context variable, it uses the stream's context. Thus, to implement a per-request timeout, one needs to spawn a separate goroutine. This implies that one will need a channel in the middle to send responses back to the original goroutine or you'll need to synchronize these two things somehow differently. Because our goal is to RecvMsg() as fast as possible, it means that we must avoid having a channel in the middle. We could do a simple loop recv -> append in the main goroutine but if RecvMsg() blocks indefinitely then it would block everything going after it so two separate receiver/sender goroutines are necessary. grpc/grpc-go#1229 (comment) this is a real pity that this is the accepted solution, to spawn a new goroutine for each RecvMsg() operation. I'll try to think about this more. I have filed grpc/grpc-go#5401 in the mean-time. |
Any progress? |
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Updated PR:
Eager strategy benchmarks:
Lazy strategy benchmarks:
|
Signed-off-by: Giedrius Statkevičius <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to see this merged in the current state. The issue I noticed might be specific to our environment, so if I can troubleshoot it myself if it happens again.
Ha, the e2e test failure seems to be exactly what I saw :) |
Yeah 👍 trying to fix it and then we can merge this |
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
I think I have fixed all of the bugs. PTAL @bwplotka @fpetkovski @yeya24 🙏 |
Signed-off-by: Giedrius Statkevičius <[email protected]>
if field == nil { | ||
continue | ||
} | ||
h := xxhash.Sum64(field.Data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up improvement, would it make sense to do this hashing in the stores so that we offload some work from the querier proxy? We might also think about other operations that can be pre-computed at the store level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it! @fpetkovski also had amazing improvements already with eager option 💪🏽
Thanks for addressing all. Let's go! 👍🏽
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in #5296 Fixes: #5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <[email protected]>
Use k-way merging for merging multiple responses coming from multiple
StoreAPIs. This avoids having a bottleneck of a Go channel that can only
hold 10 responses at once. So, now we will Recv() messages as fast as
possible. Users should be able to see much quicker queries because
network will be used as much as possible.
Benchmarks:
TestProxyStore_SeriesSlowStores
also passes which tests how everything works with slow stores.Rewritten chunk deduplication logic to use hashing.
In the future, we could get some more marginal gains by using a selection/tournament tree.
Signed-off-by: Giedrius Statkevičius [email protected]