-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos Receive (RouteOnly mode) Panic #6942
Comments
This seems the line that has caused the panic. https://github.com/klauspost/compress/blob/v1.16.7/s2/writer.go#L505C1-L506C1 But Idk how this could be related.Maybe it is a go runtime bug. I see some issues mentioning it for go 1.21.0 golang/go#62182, but not on amd64 and your version is already go 1.21.3 |
Interesting, we did use our own base OS here Alpine Linux would it be reported by other thanos users? Also found another logs related to this:
|
Report a different panic error stack trace related to HTTP2 client?
|
Lots of: But one time did get more:
|
ok, looks like this klauspost/compress#867 is the root cause and are fixed in #6950, I saw thanos main picked up the newer go mod but not 0.32.5 nor v0.33.0-rc.0. I will cherry pick the updated go mod in order to fix this internally. cc @mhoffm-aiven @yeya24 to make sure this gets patched to latest v0.33, thanks |
Thanks for reporting back the resolution @jnyi |
actually it might be a false resolution, the panic seems still happening after i upgraded [email protected] but it was very infrequent, I am trying [email protected], i will let it run for a bit longer overnight and report. Sorry for the inclusive post earlier. |
ok, it panic again with stack trace:
|
I already prepared and built artifacts for 0.33 yesterday; since this is still ongoing ill earmark it for 0.33.1! |
Could you try building with |
We have been unable to pinpoint the origin of similar crashes at MinIO. It seems to happen on only select machines and the only reliable workaround we've been using is to compile with go 1.19.x which fixes the issue. I've created an issue (link above this post) to see if we can get to the bottom of this! |
@dctrwatson - The Go team is asking for Linux kernel versions. I don't know if you have that, but if you do please add it to |
Confirmed after compile thanos with go 1.19 < 1.20, there is no panic happening ~ 10hrs, previously this would happen more than 20+ in 10 hrs for a deployments with 4 instances. |
Per the request in golang/go#64781 I added |
Ran each |
At my company we were also having a very similar panic for the Thanos Receiver (RouteOnly mode) with version Setting the environment variable |
We've been having this happen on Kernel Version: 6.1.91-99.172.amzn2023.x86_64
|
Hi, @RodrigoMenezes-Vantage I already patched to add env GODEBUG=gcshrinkstackoff=1. Still monitoring. Is this related to golang/go#64934 ?
|
Hi guys, |
It looks like this is at least fixed in Go 1.22, which then might indicate the next release of Thanos should be able to run without this ENV set i think. |
I have seen strange stuff even on Go 1.22 - so I have (just today) merged a change that ditches the large stack: klauspost/compress#1014 It may not be needed, but I couldn't live with the potential problem going forward even if it was a runtime issue. And since I don't have a clean reproducer I though this might be the best approach. I will probably make a release before too long, so hopefully we can close this down for good. The downside is mainly more clunky code - the performance seems to remain the same AFAICT. |
Thanos, Prometheus and Golang version used:
Thanos Version: 0.32.4/0.32.5
Golang Version: go1.21.3
Object Storage Provider: AWS S3
What happened: Thanos Receive with route only mode panic frequently, the setup of Receive:
What you expected to happen: No panic
How to reproduce it (as minimally and precisely as possible):
Full logs to relevant components:
Panic from k8s docker logs
Anything else we need to know:
Environment:
uname -a
):Linux thanos-writer-deployment-77677f8cb8-92h7x 5.4.0-1113-aws-fips #123+fips1-Ubuntu SMP Thu Oct 19 16:21:22 UTC 2023 x86_64 Linux
-->
The text was updated successfully, but these errors were encountered: