Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: provide up for store discovery #880

Closed
sepich opened this issue Mar 2, 2019 · 4 comments
Closed

query: provide up for store discovery #880

sepich opened this issue Mar 2, 2019 · 4 comments

Comments

@sepich
Copy link
Contributor

sepich commented Mar 2, 2019

Thanos, Prometheus and Golang version used
v0.3.2-rc.0

What happened
I'm using --store.sd-files param of thanos-query for thanos-store discovery.
One of stores dies, and AFAIK right now it is only possible to note via grpc_client_handled_total metric.
Or else i should write many alerts like absent(up{monitor="store-external-label"}) == 1 for each of stores.

What you expected to happen
Would be great to have dynamic up metric, same as in prometheus, for each of stores in discovery.
And then have single usual alert up != 1

How to reproduce it (as minimally and precisely as possible):
Add --store.sd-files=sd.yaml with some fake list of stores.
Try to understand from output of thanos-query:10902/metrics that they are not available.

Full logs to relevant components
Only such events in logs:

level=warn ts=2019-03-02T17:48:11.883731563Z caller=storeset.go:308 component=storeset msg="update of store node failed" err="initial store client info fetch: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=fake.store.local:30901

Anything else we need to know

@bwplotka
Copy link
Member

bwplotka commented Mar 4, 2019

This is really really interesting idea!

So effectively the way we do it is to have static upness defined as you mentioned in a form like this in my demo: https://github.com/improbable-eng/thanos/blob/26bdd81dec2d2c2b844e5eeff18102572f0af04f/tutorials/kubernetes-demo/manifests/thanos-ruler.yaml#L6

The only worry is to make sure that we don't have a clash with prometheus up metric. I think we may add this metric but name it bit differently like thanos_up. Thoughts?

@sepich
Copy link
Contributor Author

sepich commented Mar 4, 2019

in a form like this in my demo

Correct, and then you need to keep in sync your stores list with alerts (cluster="eu1", cluster="us1" etc). Proposition is to have volatile list of stores and static alert, which need no changes when stores being changed in the list.

name it bit differently like thanos_up

Sounds reasonable

@daixiang0
Copy link
Member

daixiang0 commented Dec 23, 2019

I think #1260 has fixed this. @bwplotka

@stale
Copy link

stale bot commented Jan 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 25, 2020
@stale stale bot closed this as completed Feb 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants