-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/bucket: merging posting groups on 0.3.2-rc.0 segfault #874
Comments
So essentially it was one go thing? not reprodcible anymore? ): |
🤔 I only tried it for a little bit and I thought because it came up pretty fast that I should fill this issue up so that perhaps other people who ran into it can come to discuss it. However, it doesn't seem like so and I will revisit this once we will upgrade to 0.3.2. I will leave it for now just in case. It seems also fishy to me that in the stack trace of this there is |
haha, interesting. Yea RC images are done from my laptop - mostly for demo (: From the newest master at that point, which should be after all fixes. This panic looks like something we fixed here: cb38508 but .. this tag should have IMO. (we can double check). Anyway there must be certain block and certain query that triggered it. Do you remember what query you used? Cannot see anything trival here. Unless the block is malformed on your disk. |
fun fact. We have:
But somehow it still crashes container 0.0 |
I think panics are missed because they are triggered in different goroutines.. |
We see this same issue happening almost daily. Using 0.3.1 |
What query triggers it? |
I'll do some testing |
Plus move to v0.3.2 would be nice (: |
query: debug log attached I actually just upgraded to 0.3.1 when I found out about 0.3.2 :P |
I upgraded to 0.3.2 and the panic is gone.
|
And the panic is back again:
|
We can repro it internally as well EDIT: Now it's gone. I think it's tight to particular query & block. We will keep trying. |
The problem is rather here: https://github.com/improbable-eng/thanos/blob/master/pkg/store/bucket.go#L1229:29 and particularly here: https://github.com/improbable-eng/thanos/blob/master/pkg/store/bucket.go#L1357:29 The issue is that the test cases are there (we might missing some?) so the bug is not really clear. It might indicate something similar like here: #335 so race condition (nothing obvious) or hidden OOM (failed to alloc). We will keep digging. |
Status:
Which again, either suggests:
|
My |
Sometimes I get a segmentation fault. Seems related to the new posting group merging logic.
Thanos, Prometheus and Golang version used
improbable/thanos:v0.3.2-rc.0
What happened
Thanos Store crashed not long after starting up.
What you expected to happen
Thanos Store to work.
How to reproduce it (as minimally and precisely as possible):
Unfortunately I don't have any reproducer. Perhaps something is visible from the stack trace.
Full logs to relevant components
The text was updated successfully, but these errors were encountered: