Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos query timeout #440

Closed
shirpx opened this issue Jul 24, 2018 · 5 comments
Closed

Thanos query timeout #440

shirpx opened this issue Jul 24, 2018 · 5 comments
Labels

Comments

@shirpx
Copy link

shirpx commented Jul 24, 2018

Hi,
i connected LB to my thanos-query servers as a datasource in grafana,
when trying to query 7 days back on redis dashboard
i see that the query servers cpu raises but not enough for it to die,
but after that the datasource just becomes inaccessible and i can’t query at all,
the datasource becomes accessible again only after restarting the thanos-query container on the query servers(never get’s back to work without a reset)

thanos query graphs:
thanos query network
thanos-query-cpu-metric
thanos query memory
thanos-query-dashboard
thaos-query dashboard 2

timeout directly throw thanos http access:
thanos-query timeout

it’s important to mention that my prometheus servers can handle 7 days on the redis dashboard when querying them directly and it’s looks like thanos-query uses more prometheus resources than querying directly.
redisDashboard.txt

queryvsdirectly

i have 2 prometheus servers scarping in the region i am querying on and in the dashboard i’m querying only for that region.

the resources of my query severs and my prometheus servers in that region are equal (for comparison purposes due this issue)

my query servers resources: min 2 servers with 12 vCPUs, 76 GB (autoscaled)
my prometheus servers resources:
2 servers with 12 vCPUs, 76 GB

in the query logs i see this error:
level=error ts=2018-07-24T08:24:50.715483428Z caller=proxy.go:117 err=“fetch series for [{monitor codelab-monitor} {replica prometheus-master-us-central1-a-1}]: rpc error: code = Canceled desc = context canceled”

nothing is special in the sidecar logs

@bradleybluebean
Copy link

Not sure if this helps but I posted a comment on a similar sounding issue
#455 (comment)

@krasi-georgiev
Copy link
Contributor

Cold you try with the latest Thanos version as Bartek added remote read streaming and it should fix this problem.
#1268

@krasi-georgiev
Copy link
Contributor

Just noticed that the streaming PR got merged just after 2.12 was cut so need to wait for the 2.13 Prometheus release or just use the master image.
https://github.com/prometheus/prometheus/commits/master?after=26e8d25e0b0d3459e5901805de992acf1d5eeeaa+34

@bwplotka
Copy link
Member

bwplotka commented Sep 13, 2019 via email

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this as completed Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants