query: provide `up` for store discovery #880

sepich · 2019-03-02T18:12:02Z

Thanos, Prometheus and Golang version used
v0.3.2-rc.0

What happened
I'm using --store.sd-files param of thanos-query for thanos-store discovery.
One of stores dies, and AFAIK right now it is only possible to note via grpc_client_handled_total metric.
Or else i should write many alerts like absent(up{monitor="store-external-label"}) == 1 for each of stores.

What you expected to happen
Would be great to have dynamic up metric, same as in prometheus, for each of stores in discovery.
And then have single usual alert up != 1

How to reproduce it (as minimally and precisely as possible):
Add --store.sd-files=sd.yaml with some fake list of stores.
Try to understand from output of thanos-query:10902/metrics that they are not available.

Full logs to relevant components
Only such events in logs:

level=warn ts=2019-03-02T17:48:11.883731563Z caller=storeset.go:308 component=storeset msg="update of store node failed" err="initial store client info fetch: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=fake.store.local:30901

Anything else we need to know

The text was updated successfully, but these errors were encountered:

bwplotka · 2019-03-04T10:21:33Z

This is really really interesting idea!

So effectively the way we do it is to have static upness defined as you mentioned in a form like this in my demo: https://github.com/improbable-eng/thanos/blob/26bdd81dec2d2c2b844e5eeff18102572f0af04f/tutorials/kubernetes-demo/manifests/thanos-ruler.yaml#L6

The only worry is to make sure that we don't have a clash with prometheus up metric. I think we may add this metric but name it bit differently like thanos_up. Thoughts?

sepich · 2019-03-04T20:09:53Z

in a form like this in my demo

Correct, and then you need to keep in sync your stores list with alerts (cluster="eu1", cluster="us1" etc). Proposition is to have volatile list of stores and static alert, which need no changes when stores being changed in the list.

name it bit differently like thanos_up

Sounds reasonable

daixiang0 · 2019-12-23T06:00:10Z

I think #1260 has fixed this. @bwplotka

stale · 2020-01-25T03:13:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bwplotka added feature request/improvement good first issue difficulty: medium help wanted labels Mar 4, 2019

mreichardt95 mentioned this issue Mar 8, 2019

Query: add thanos_store_up metric to StoreSet #900

Closed

stale bot added the stale label Jan 25, 2020

stale bot closed this as completed Feb 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query: provide `up` for store discovery #880

query: provide `up` for store discovery #880

sepich commented Mar 2, 2019

bwplotka commented Mar 4, 2019

sepich commented Mar 4, 2019

daixiang0 commented Dec 23, 2019 •

edited

Loading

stale bot commented Jan 25, 2020

query: provide up for store discovery #880

query: provide up for store discovery #880

Comments

sepich commented Mar 2, 2019

bwplotka commented Mar 4, 2019

sepich commented Mar 4, 2019

daixiang0 commented Dec 23, 2019 • edited Loading

stale bot commented Jan 25, 2020

query: provide `up` for store discovery #880

query: provide `up` for store discovery #880

daixiang0 commented Dec 23, 2019 •

edited

Loading