-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receive: high in flight requests and high context deadline exceeded and ingestion latency in main branch #7248
Comments
Not sure I get the report. Are you saying that even after reverting the latency didn't go back? 🤔 How does |
Actually I suspect it is special to our setup because we use multi az hashring, will debug more with tracing:
|
Is it because there is not enough workers due to #7045 and requests keep queueing? |
I think I found the bug, this RemoteWriteAsync operation isn't parallel but sequential due to
|
Hi @yeya24 and @GiedriusS , I've submitted a fix, appreciate your review: #7267 |
@jnyi there's one more conflict to solve in the PR, FYI. You were also pinged there. |
Thanos, Prometheus and Golang version used:
Thanos: 0.35.0-dev
Golang: go1.21.7
Object Storage Provider: s3
What happened:
switch from v0.34.1 -> v0.35.0-dev experience high in flight requests, found this #7045, did a few things:
receive.forward.async-workers
to a large number, the issue remainsWhat you expected to happen:
With async writes, the write latency should improve.
How to reproduce it (as minimally and precisely as possible):
Our receive commands (we split receive into router and ingestor modes), below is the args for router:
Full logs to relevant components:
full goroutine:
full_goroutine.txt
Anything else we need to know:
The text was updated successfully, but these errors were encountered: