query: add store.read-timeout parameter to avoid partial response failure when one of stores timed out #895

d-ulyanov · 2019-03-06T13:17:33Z

This PR fixes partial response failure when one of stores timed out (if partial response enabled).
Issue: #835

Changes

Added store.read-timeout argument to Thanos Query. By default, store.read-timeout = 2m (as query.timeout). If store.read-timeout > query.timeout it will be overrided by query.timeout value.
This store.read-timeout propagated to GRPC StoreClient via context using seriesServer, see querier.go diff.
Timeout will be finally handled inside GRPC client.

Also, mock structures added to store package to reuse it in unit tests in different packages.

Verification

Changes covered by unit tests.

d-ulyanov · 2019-03-11T09:41:30Z

Hi, Thanos team!
Review pls :)

bwplotka

Generally I love the idea, but I think we need this timeout in different place. (:

Plus some minor suggestion while mocking. Let me know what do you think.

cmd/thanos/query.go

pkg/query/querier.go

pkg/store/storepb/mock.go

pkg/store/mock.go

d-ulyanov · 2019-03-15T11:33:57Z

Currently we still have one strange place - mergeSet.
mergeSet.Next() can stuck if one of streams will return data slowly.
Essentially it means that whole request can fail with timeout in case when one of streams (stores) returning data slowly but all other streams are working well (see test case where partialResponse: true and queryTimeout < storeReadTimeout)

But I think its idea for next pull request.

bwplotka · 2019-03-15T12:00:14Z

@povilasv continued on your PR @d-ulyanov here #920 I hope you are ok with this (:

…lure when one of stores timed out

Co-Authored-By: d-ulyanov <[email protected]>

povilasv · 2019-03-15T14:17:26Z

Currently we still have one strange place - mergeSet.
mergeSet.Next() can stuck if one of streams will return data slowly.
Essentially it means that whole request can fail with timeout in case when one of streams (stores) returning data slowly but all other streams are working well (see test case where partialResponse: true and queryTimeout < storeReadTimeout)

But I think its idea for next pull request.

@d-ulyanov this is solved by #928

GiedriusS · 2019-03-29T08:58:20Z

As another version of this has been merged, I think it's safe to close this PR. Please shout at me if it needs to be reopened or I made a mistake somewhere. Thank you a lot for your contribution! :)

povilasv · 2019-03-29T09:10:00Z

@GiedriusS it's a different PR, so we still need this. I.e. this timeout is essiantially query-timeout, and cancels the context for the whole query. The current timeout doesn't work correctly, due to it sets timeout on PromQL engine

d-ulyanov · 2019-03-29T15:22:05Z

Hi, @povilasv
Is it possible to include this PR to nearest release?
Also, if you need any help, please, let me know :)

povilasv · 2019-03-30T05:48:00Z

@d-ulyanov sure thing, could you resolve merge conflicts? I've lost the current state on this one, is this ready for review? Once it is I can reread it and do another round of review :)

# Conflicts: # cmd/thanos/query.go # docs/components/query.md # pkg/store/proxy.go # pkg/store/proxy_test.go

d-ulyanov · 2019-04-03T16:31:02Z

Hey @povilasv
All conflicts are resolved and I've added some tests also.

d-ulyanov

@povilasv Its cool that this PR will appear in v0.4.0
I have a time to finish this PR, but anyway, fill free to make any changes if you think that smth is wrong :) Edits from maintainers are allowed.
Also, please, review my comments about testing

d-ulyanov · 2019-04-04T15:17:56Z

cmd/thanos/query.go

@@ -93,6 +93,9 @@ func registerQuery(m map[string]setupFunc, app *kingpin.Application, name string

 	storeResponseTimeout := modelDuration(cmd.Flag("store.response-timeout", "If a Store doesn't send any data in this specified duration then a Store will be ignored and partial data will be returned if it's enabled. 0 disables timeout.").Default("0ms"))

+	storeReadTimeout := modelDuration(cmd.Flag("store.read-timeout", "Maximum time to read response from store. If request to one of stores is timed out and store.read-timeout < query.timeout partial response will be returned. If store.read-timeout >= query.timeout one of stores is timed out the client will get no data and timeout error.").


Agree partially.
In code we have query layer which also doing some logic (deduplication). So we can't really say that storeReadTimeout will guarantee that request will be killed in time. We just limiting store request time. That's why I think that its not honest to use promql-engine-timeout here or we need also to add ctx to query layer.

pkg/store/proxy.go

pkg/store/proxy_test.go

povilasv · 2019-04-13T15:10:24Z

@d-ulyanov FYI I've changed the flag names in your branch

d-ulyanov · 2019-04-14T08:55:19Z

@povilasv cool, many thanks!

bwplotka

Cool, thanks for a good work, but I think we can still improve it. Sorry for review delay!

Some suggestions below. Please be careful on compatibility side. We are not 1.0 API stable, BUT we don't want to totally confuse our users either.

Also, can I ask what's the point of this logic, what's the use case? E.g WHEN would you set (old) query timeout different to proxy/select/store.timeout underneath PromQL?

Cannot find any immdiate idea. Do you want to timeout on select, but allow more time for promQL?

bwplotka · 2019-04-14T11:31:27Z

pkg/store/proxy.go

@@ -44,7 +45,11 @@ type ProxyStore struct {
 	component      component.StoreAPI
 	selectorLabels labels.Labels

+	// responseTimeout is a timeout for any GRPC operation during series query


missing trailing period in comment, same below

bwplotka · 2019-04-14T11:31:44Z

pkg/store/proxy.go

@@ -119,6 +126,8 @@ func newRespCh(ctx context.Context, buffer int) (*ctxRespSender, <-chan *storepb
 	return &ctxRespSender{ctx: ctx, ch: respCh}, respCh, func() { close(respCh) }
 }

+// send writes response to sender channel


Please use full sentence comments. They are for humans (:

bwplotka · 2019-04-14T11:33:50Z

pkg/store/proxy.go

+	return
+}
+
+// setError sets error (thread-safe)


full sentence comments

bwplotka · 2019-04-14T11:56:12Z

pkg/store/proxy.go

@@ -147,6 +156,9 @@ func (s *ProxyStore) Series(r *storepb.SeriesRequest, srv storepb.Store_SeriesSe
 		respSender, respRecv, closeFn = newRespCh(gctx, 10)
 	)

+	storeCtx, cancel := s.contextWithTimeout(gctx)


Ok, what's the difference in this timeout to the overall gRPC request timeout or gRPC server timeout? I believe Querier can control the same thing by just specifying timeout here: https://github.com/improbable-eng/thanos/blob/1cd9ddd14999d6b074f34a4328e03f7ac3b7c26a/pkg/query/querier.go#L183

I would remove this from proxy.go completely and set this timeout on querier client side. What do you think?

Effect is the same, if not better, as you missed to pass this context to gctx = errgroup.WithContext(srv.Context())

Also, can I ask what's the point of this logic, what's the use case? E.g WHEN would you set (old) query timeout different to proxy/select/store.timeout underneath PromQL?
Cannot find any immdiate idea. Do you want to timeout on select, but allow more time for promQL?

Imagine that we have two stores: first is very slow, second is fast. Partial time-out is enabled.
Behaviour that we want to see: if slow store is timed out - we still receiving data from second.
So, to make it possible, store.read-timeout should be less than PromQL timeout.
You're right, we can move it to querier.go and the effect will be the same at the moment (and it was my first implementation :) )

I think it doesn't matter where we have this timeout, more important that at the moment it doesn't work perfect because of reading from stores in mergedSet: reading from stores works in series there and first slow store still blocks reading from second. We need some improvements there...

bwplotka · 2019-04-14T11:56:46Z

pkg/store/proxy.go

 	responseTimeout time.Duration
+
+	// queryTimeout is a timeout for entire request
+	queryTimeout time.Duration


let's kill it IMO

bwplotka · 2019-04-14T12:03:59Z

cmd/thanos/query.go

+	promqlTimeout := modelDuration(cmd.Flag("promql.timeout", "Maximum time to execute PromQL in query node.").
+		Default("2m"))
+
+	queryTimeout := modelDuration(cmd.Flag("query.timeout", "Maximum time to process request by query node. If a request to one of the stores has timed out and query.timeout < promql.timeout then a partial response will be returned. If query.timeout >= promql.timeout then only timeout error will be returned.").


This is quite confusing, I am having hard time to understand on what things we timeout here

bwplotka · 2019-04-14T12:04:41Z

cmd/thanos/query.go

@@ -61,9 +61,6 @@ func registerQuery(m map[string]setupFunc, app *kingpin.Application, name string
 	webExternalPrefix := cmd.Flag("web.external-prefix", "Static prefix for all HTML links and redirect URLs in the UI query web interface. Actual endpoints are still served on / or the web.route-prefix. This allows thanos UI to be served behind a reverse proxy that strips a URL sub-path.").Default("").String()
 	webPrefixHeaderName := cmd.Flag("web.prefix-header", "Name of HTTP request header used for dynamic prefixing of UI links and redirects. This option is ignored if web.external-prefix argument is set. Security risk: enable this option only if a reverse proxy in front of thanos is resetting the header. The --web.prefix-header=X-Forwarded-Prefix option can be useful, for example, if Thanos UI is served via Traefik reverse proxy with PathPrefixStrip option enabled, which sends the stripped prefix value in X-Forwarded-Prefix header. This allows thanos UI to be served on a sub-path.").Default("").String()

-	queryTimeout := modelDuration(cmd.Flag("query.timeout", "Maximum time to process query by query node.").


I would leave this as it was, reasons:

query sounds like being related to QueryAPI, that's why this name. Which always included promql one. That's why I would leave like it was, the fact that query also queries something else underneath is hidden. Also it matches Prometheus query flags as well. And yet another thing compatibility. Even though we are not 1.0 this API change is major hit into compatibiltiy as suddently query timeout means something opposite to what it was ):

promql sounds like only promQL, however we know it involves proxy StoreAPI Series() invocation. More over not only one, but sometimes more then one! PromQL can run multiple Selects per single Query.

I really like this idea, but IMO it should be:
query.timeout (as it was)
store.timeout for proxy.go client timeout.

alternatively if we want to be more explicit maybe query.select-timeout? That will also include then the deduplication process.

bwplotka · 2019-04-14T12:28:29Z

After offline discussion with @povilasv I missed the fact that @povilasv meant to fix #919 but this PR specifies as addressing #835 ...

So let's organize bit better:

For [Query] Partial response doesn't work if one of stores returned timeout #835 I think query: Add store.response-timeout #928 already addressed this.
For query: Some Thanos Queries hang forever #919 (context propagation problem), we just have to fix query.timeout to properly be propagated to Select in querier. For this we need to investigate the PromQL code a bit, why the thing we pass to query.Exec(ctx) in api.go is not propagated to Select. I don't think we need another flag... Can help in searching this path.

Sorry for confused, @povilasv @d-ulyanov does it make sense to you?

povilasv · 2019-04-15T07:54:33Z

After offline discussion with @povilasv I missed the fact that @povilasv meant to fix #919 but this PR specifies as addressing #835 ...

So let's organize bit better:

For [Query] Partial response doesn't work if one of stores returned timeout #835 I think query: Add store.response-timeout #928 already addressed this.

For query: Some Thanos Queries hang forever #919 (context propagation problem), we just have to fix query.timeout to properly be propagated to Select in querier. For this we need to investigate the PromQL code a bit, why the thing we pass to query.Exec(ctx) in api.go is not propagated to Select. I don't think we need another flag... Can help in searching this path.

Sorry for confused, @povilasv @d-ulyanov does it make sense to you?

I'll try to fix the context propogation in PromQL eval and let's see where it goes.

Let's keep this open for now.

Also this PR has some cool stuff lke writeWarningOrErrorResponse so we can use part of this
Also some good tests 👍

bwplotka · 2019-04-15T12:18:01Z

Also this PR has some cool stuff lke writeWarningOrErrorResponse so we can use part of this
Also some good tests +1

agreee

d-ulyanov · 2019-04-15T13:53:23Z

Hey, @povilasv @bwplotka

Thanks for your review.

I agree that our current solution is not perfect.
We tested timeouts today with different combinations of slow/fast stores and it seems that current logic with store.response-timeout from #928 works a bit wrong:

Store A - slow, store B - fast:

store.response-timeout waits for data from store.Next(). First batch of series will be returned not immediately because it could take some time on Prometheus side (finding index, reading data from disk, etc...). In this case this timeout fails.
In mergedSet we're reading synchronously from all stores. It means that if store.response-timeout=30s and store A is not responding, we will wait 30s before we will start reading from store B (first recv() already finished, but samples are not merged still). And when we will start reading from B - 30s already passed and we will receive nothing.

I propose the following:

make reading and merging process asynchronous
keep only one store.read-timeout that should limit reading time from store OR move this timeout to querier layer at all.
remove store.response-timeout (we can do it a bit later)

bwplotka · 2019-04-15T14:31:32Z

Yes, very related explanation of confusing slow/fast stores can be found here: #900 (comment)

bwplotka · 2019-04-15T14:34:51Z

make reading and merging process asynchronous

Can we talk about this more? It's probably the most critical part of Thanos, might need some design first.

So essentially, taking the ready responses first instead of arbitrary Next of SeriesSet that are merged inside iterators, right?

I think we might need some work on interface side as SeriesSet Next() things does not allow it easily. We would need some benchmarks as well for it, but I feel like this is essential change yes. Not for 0.4.0, but for next one.

povilasv · 2019-04-16T03:31:20Z

* make reading and merging process asynchronous

* keep only one `store.read-timeout` that should limit reading time from store OR move this timeout to querier layer at all.

* remove `store.response-timeout` (we can do it a bit later)

I disagree, we still need to have store.response-timeout. As I told you before it is used for kicking out slow stores and making queries respond faster. i.e. if query-timeout is 10 minutes, you would always wait 10minutes for every query if one store node is super slow. And if store-response-timeout set to 1s, it would be kicked out after 1s + delta, thus query responding way faster.

povilasv · 2019-04-16T03:33:59Z

@d-ulyanov if you think --store.response-timeout doesn't do what it is supposed to do, file an issue add logs tell me what behaviours you see, etc and I can investigate. I will fix the bug if there is a bug. So let's not discuss this here.

But --store.response-timeout is here to stay in one shape or another as it reduces Query duration by a alot.

povilasv · 2019-04-16T03:36:07Z

I propose the following:
* make reading and merging process asynchronous

Currently we are reading from buffered channel, which is asyncronuos currently. So not sure what you mean here.

Re merging please file an issue and write down what you want to do. IMO This is not the best place to discuss this. Thanks :)

povilasv · 2019-04-16T06:16:46Z

@d-ulyanov @bwplotka So today I spent like 2 hours trying to figure out where we lose the context and I couldn't find it.

But I think #1035 this potentially simplifies code in context passing case.

As Series call in server doesn't take context as an argument:

Series(*SeriesRequest, Store_SeriesServer) error

But in client does:

Series(ctx context.Context, in *SeriesRequest, opts ...grpc.CallOption) (Store_SeriesClient, error)

So the code becomes a bit cleaner and we can see how the context propogates :)

d-ulyanov · 2019-04-16T12:26:34Z

Hi @povilasv
Okay, lets discuss about --store.response-timeout in another issue later, we'll prepare logs/traces.

make reading and merging process asynchronous
About this.

Initial issue:
We have 2 stores: A(slow) and B(fast). Partial response enabled, store.request-timeout=5s, store.read-timeout=30s, query.timeout=1m

Expected behaviour: We're receiving data from store B.
Actual behaviour: After timeout we're receiving nothing.

At the moment it seems that we have bottleneck here:

mergedSet := storepb.MergeSeriesSets(seriesSet...)

https://github.com/improbable-eng/thanos/blob/master/pkg/store/proxy.go#L214

On init it calls Next() on each store in newMergedSeriesSet(a, b SeriesSet). And it blocks, because calling Next() on all stores: Next() could wait response from store A (because grpc read could block in startStreamSeriesSet func: r, err := s.stream.Recv()) and we won't collect anything from store B in this moment even if store B already returned some data.
https://github.com/improbable-eng/thanos/blob/master/pkg/store/storepb/custom.go#L102

Further, in cycle for mergedSet.Next() { blocking can happen again.
Also, we can't use short store.response-timeout because time to first data could be significant.

My propose:

Read as much data as possible from all stores independently and write it to shared channel.
Read series from this channel to some buffer in proxy.go.
After all stores returned data OR context closed - merge received data and write it back to querier.go
It seems that we don't need MergeSeriesSet structure in this case (we need only part of logic related to merging), so MergeSeriesSet structure could be removed.

We will allocate additional memory for this buffering. But I think its not a big problem, isn't it?

PS Now we already have some buffer in seriesServer
https://github.com/improbable-eng/thanos/blob/master/pkg/query/querier.go#L118

povilasv · 2019-04-16T13:07:16Z

Read as much data as possible from all stores independently and write it to shared channel.

@d-ulyanov please create a seperate issue for this. And let's stop bloating this conversation :D

bwplotka · 2019-04-16T15:46:01Z

As Series call in server doesn't take context as an argument:
Series(*SeriesRequest, Store_SeriesServer) error

Store_SeriesServer.Context() has it

bwplotka · 2019-04-16T16:09:31Z

@d-ulyanov Can we take this offline to avoid confusion? sounds like we need to discuss this first. Slack of any other chat? Are you on improbable-eng slack maybe? (:

bwplotka · 2019-04-18T10:02:33Z

We decided offline that this flag is not helping with anything much, we need to fix SeriesSet iterator place first.

d-ulyanov mentioned this pull request Mar 6, 2019

[Query] Partial response doesn't work if one of stores returned timeout #835

Closed

bwplotka requested review from bwplotka and domgreen March 11, 2019 15:00

bwplotka added feature request/improvement state: in-review labels Mar 11, 2019

bwplotka requested changes Mar 11, 2019

View reviewed changes

cmd/thanos/query.go Outdated Show resolved Hide resolved

cmd/thanos/query.go Outdated Show resolved Hide resolved

pkg/query/querier.go Outdated Show resolved Hide resolved

pkg/store/storepb/mock.go Outdated Show resolved Hide resolved

pkg/store/mock.go Outdated Show resolved Hide resolved

povilasv mentioned this pull request Mar 13, 2019

query: Add Store timeout to Thanos Query #920

Closed

Dmitry Ulyanov and others added 7 commits March 15, 2019 15:42

query: add store.read-timeout parameter to avoid partial response fai…

dfadec5

…lure when one of stores timed out

query: fix errcheck in querier tests

3a5701e

queery: update query.md docs

fc4460e

Update cmd/thanos/query.go

81d7d3d

Co-Authored-By: d-ulyanov <[email protected]>

query: store.read-timeout moved to ProxyStore

5a36f3f

proxy_test: decrease duration in ReadTimeoutWithPartialResponse test

5906c73

query.md: upgrade docs

0e8adab

d-ulyanov mentioned this pull request Mar 15, 2019

query: Add store.response-timeout #928

Merged

GiedriusS closed this Mar 29, 2019

GiedriusS reopened this Mar 29, 2019

Dmitry Ulyanov added 4 commits April 1, 2019 17:38

query.go: remove excess checks

0bdcf03

Merge remote-tracking branch 'thanos/master' into store-read-timeout

86a44da

# Conflicts: # cmd/thanos/query.go # docs/components/query.md # pkg/store/proxy.go # pkg/store/proxy_test.go

proxy.go: upgrade tests, add comments

07b419b

proxy.go: improve code after merging, upgrade tests

8356b80

d-ulyanov commented Apr 5, 2019

View reviewed changes

Dmitry Ulyanov added 2 commits April 5, 2019 13:32

proxy.go: improve tests performance

8ffee1b

proxy_test.go: tests improvements

9be840e

povilasv added 2 commits April 13, 2019 18:06

Rename timout flags

4de0dc0

Merge branch 'master' into store-read-timeout

afe009b

povilasv added 2 commits April 13, 2019 18:11

Fix linter

e5f2494

Add Changelog entry

a1ef762

povilasv requested a review from bwplotka April 13, 2019 17:34

bwplotka requested changes Apr 14, 2019

View reviewed changes

povilasv mentioned this pull request Apr 16, 2019

Change Proxy to also be Store Client #1035

Closed

bwplotka closed this Apr 18, 2019

		@@ -93,6 +93,9 @@ func registerQuery(m map[string]setupFunc, app *kingpin.Application, name string

		storeResponseTimeout := modelDuration(cmd.Flag("store.response-timeout", "If a Store doesn't send any data in this specified duration then a Store will be ignored and partial data will be returned if it's enabled. 0 disables timeout.").Default("0ms"))

		storeReadTimeout := modelDuration(cmd.Flag("store.read-timeout", "Maximum time to read response from store. If request to one of stores is timed out and store.read-timeout < query.timeout partial response will be returned. If store.read-timeout >= query.timeout one of stores is timed out the client will get no data and timeout error.").

query: add store.read-timeout parameter to avoid partial response failure when one of stores timed out #895

query: add store.read-timeout parameter to avoid partial response failure when one of stores timed out #895

Conversation

d-ulyanov commented Mar 6, 2019

Changes

Verification

d-ulyanov commented Mar 11, 2019

bwplotka left a comment

Choose a reason for hiding this comment

d-ulyanov commented Mar 15, 2019

bwplotka commented Mar 15, 2019

povilasv commented Mar 15, 2019 • edited Loading

GiedriusS commented Mar 29, 2019

povilasv commented Mar 29, 2019 • edited Loading

d-ulyanov commented Mar 29, 2019

povilasv commented Mar 30, 2019 • edited Loading

d-ulyanov commented Apr 3, 2019

d-ulyanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

povilasv commented Apr 13, 2019

d-ulyanov commented Apr 14, 2019

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka commented Apr 14, 2019

povilasv commented Apr 15, 2019

bwplotka commented Apr 15, 2019

d-ulyanov commented Apr 15, 2019 • edited Loading

bwplotka commented Apr 15, 2019

bwplotka commented Apr 15, 2019 • edited Loading

povilasv commented Apr 16, 2019 • edited Loading

povilasv commented Apr 16, 2019 • edited Loading

povilasv commented Apr 16, 2019 • edited Loading

povilasv commented Apr 16, 2019 • edited Loading

d-ulyanov commented Apr 16, 2019

povilasv commented Apr 16, 2019

bwplotka commented Apr 16, 2019 • edited Loading

bwplotka commented Apr 16, 2019

bwplotka commented Apr 18, 2019

povilasv commented Mar 15, 2019 •

edited

Loading

povilasv commented Mar 29, 2019 •

edited

Loading

povilasv commented Mar 30, 2019 •

edited

Loading

d-ulyanov commented Apr 15, 2019 •

edited

Loading

bwplotka commented Apr 15, 2019 •

edited

Loading

povilasv commented Apr 16, 2019 •

edited

Loading

povilasv commented Apr 16, 2019 •

edited

Loading

povilasv commented Apr 16, 2019 •

edited

Loading

povilasv commented Apr 16, 2019 •

edited

Loading

bwplotka commented Apr 16, 2019 •

edited

Loading