-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: crashing after upgrade to 0.3.0 #829
Comments
Did it happen pre-0.3.0? Are you sure you have enough RAM in that box to execute this query? If you execute |
I'm try set |
Also seeing the same panic with Thanos Store after upgrading to 0.3.0:
I can see that Store uses a fair amount of memory during bucket initialisation and then drops off to a more conservative usage. As you can see below it does not run out of memory at the moment it crashes (7.61GB / 20GB @ 11:08:30): |
Hm.. the code path that is problematic looks exactly like this: #816 This means that we ask for more bytes in object storage and reader gives us less. We probably need some check anyway (like mentioned in discussion in linked ticket). But the overall state looks lile malformed block. Why would index point to non existsing bytes? Unless we have bug in posting code, which was touched recently. This happens on particular block or all of them? How often? |
In my case all queries to remote storage in 0.3.0 crashes. I'm downgrade to 0.2.1 only store node and all working fine now |
I have also downgraded by storage processes to 0.2.1 and all is fine now (have left query, and compactor running as 0.3.0) |
Is the block that this is happening a partially uploaded block? From the code, this should only happen if we are trying to get data that we expect would be in the block but has not been written. |
No it's not. I don't have partially uploaded blocks (at least on the sidecar or compact logs). But in one s3 bucket (local minio cluster) I have blocks from multiple prometheus with different tags (replica,dc,service) and store queries to all of them failed on thanos 0.3.0, but it work fine on version 0.2.1. |
What @R4scal says almost exactly mirrors my issue as well. Although I had 2 separate environments (different buckets etc) and 0.3.0 failed store queries so I downgraded that to 0.2.1 and all ok. Everything else is at 0.3.0. I can switch versions at will really if anything needs testing. |
Yes would be nice to know, if particular block is wrong or just all. If
0.2.1 works then it seems like it has to be something with upgrade of tsdb
and posting refactor we had ):
…On Mon, Feb 11, 2019, 12:38 Paul Seymour ***@***.***> wrote:
What @R4scal <https://github.com/R4scal> says almost exactly mirrors my
issue as well. Although I had 2 separate environments (different buckets
etc) and 0.3.0 failed store queries so I downgraded that to 0.2.1 and all
ok. Everything else is at 0.3.0. I can switch versions at will really if
anything needs testing.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#829 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGoNuyRp7o2UkFQSf-7v1b-Ys6Y0PZp3ks5vMaqlgaJpZM4ay7M7>
.
|
Doesn't seem tied to a particular block. But it's hard to say for sure. Anything I can run to point to ? Running it in debug shows a bunch but nothing to indicate a problem with any of them. |
I think this is related to this change: #753 |
Important question. What queries are you doing exactly? |
Example queries that crash thanos 0.3.0 from grafana in my case:
|
Moved bucket e2e tests to table test. Signed-off-by: Bartek Plotka <[email protected]>
Thanks for all info! In just couple of days after release we found out (thanks to you guys) and hopefully fixed this: #837 (: We need to fix some issues with negative matcher and then we will do patch release to add this. |
* setting the start and end to prior posting changes * really need some tests data but this may also be the fix * moving the start and end inside the loop, so they are not updated as we iterate over items * Added regressions tests for #829. Moved bucket e2e tests to table test. Signed-off-by: Bartek Plotka <[email protected]> * Fixed overestimation for fetching chunks and series. Signed-off-by: Bartek Plotka <[email protected]> * Removed wrong comment. Signed-off-by: Bartek Plotka <[email protected]> * changing func to match interface
Fixed by this: #837 (: |
Hi
I have 24h storage in prometheus and thanos for long-term. After upgrade thanos to 0.3.0 querying interval more then 24 crashing tanhos store:
The text was updated successfully, but these errors were encountered: