-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: Thanos store consumes almost 100G of memory at startup #325
Comments
I would first of all change index cache. 64MB might be way too low. Give it a 1GB. For chunk you can give 20Gb, should be enough. Both of these numbers should limit memory consumption for store.. maybe extremely small number 64MB results with unexpected things?
BTW thanks for so much input! It sounds quite bad, hopefully, we find the root cause soon. (: |
Thanks! Can we do one more thing with pprof? Let's grab heap again and run |
The numbers are insane 0.o |
This is weird: Looking now if the above error can cause some mem leak |
Here is the pdf of the heap: Yes, I noticed that issue with the key, too. I found the block in S3 and it is there with a meta.json that looks fine. However, since now I am running compactor and it fails every 1h with I may have some issues with my data, so I thought of dumping production prometheus data to staging and setting up Thanos, but I postponed it for the time being. |
Wonder if that is related to 64MB cache limit. Will look closer into code soon, thanks for all details. This should be enough for now. |
First of all the meta.json error:
Seems like simply there is no
All of this indicates that we had some partial uploads, which is not good. What compactor is saying exactly? It might indicate same issues. Only thanos sidecars or compactor uploads blocks so we need all logs that we have. Especially for thanos-sidecars.. Maybe some thanos-sidecar had some troubles while uploading block? This however does not explain memory leak yet. The quickest way to tell, might be by adding more integration tests with some mocked bucket store. Can look on that tomorrow, nothing obvious from code. The mem leak is in |
As far as meta.json is concerned, there is a
Also here is how the compactor fails at after about 40min of work:
For now I have dropped the compactor cronjob. I would agree though that somehow the sidecar or the compactor messed up the data because these messages are far from normal, but I don't know how. I made an experiment and pointed the store and sidecars to an empty bucket and everything works fine (there is no data and the new blocks are getting uploaded fine). However, I don't know if when there are a lot of indexes, if the spikes will come back. I started backing up the production EBS prometheus volumes to try and setup a prometheus server with data on staging. Then will try to run the sidecar containers and afterwards the store. Will monitor for errors while uploading and other issues and get back at you with the results. Once again, I very much appreciate the help! |
I think we have two separate issues here. One is some memory leak, when partial upload happens (for example if index is uploaded in the half or something like this?) - this needs to investigated, I am writing now some local tests that will help confirm this. Second is scary, because if there was no upload error on sidecar/compactor and meta.json was eventually uploaded, we might hit this issue: #298 so S3 being not really strongly consistent for write-read. (In theory, it is with some caveats, but when I am rereading upon this, we actually might hit these caveats) - this means we need some solution to this as well. I think we need to invest some time in this too. |
Today I did a bunch of tests with 3 months of data. I read in the description that the sidecar backs up data but does it do so for legacy blocks? Like in my case where I had retention interval of That raises the question how I managed to put all those blocks to S3 last time. I remember that I did several changes at once - changed the retention to 24h, changed min-block-duration and max-block-duration to very small values. I continue with my checks and will try to reproduce the issue. |
Yup, data migration is a valid use case and we need to figure out how to do it safely: #206 |
I finally managed to did it! I manually edit each meta.json of each block in Prometheus setting compaction level to 1. Then deleted the blocks from the Now all the data is in the block storage. I switched the prometheus retention period back to I will perform a few other tests before merging to production but it looks very promising. Thank you so much for the help! |
Wow, yeah you handled data migration perfectly, awesome! So now store node can access old data just fine? Without any mem leak? Added more tests to make sure all objstore implementations are equal: #327 Any chance to have you, checkout the branch and run the test while having S3 envvars exported? |
Yes, no mem leak issues now - the Thanos Store is occupying about 700m of memory and when I load it with fat queries it tops to 2G of memory which is very efficient. When this happened in Prometheus we had an OEMKill. I have a snapshot of the EBS volumes and the S3 bucket with the |
This seems to be resolved. |
We have a strange issue with the memory consumption of Thanos Store in our Thanos setup.
Currently, we have a working setup of Prometheus and alertmanager as follows:
quay.io/prometheus/prometheus:v2.2.1
. Each Prometheus pod has a thanos sidecar. The docker image of the sidecar is:improbable/thanos:master
(image id2950dff67c9a
).quay.io/prometheus/alertmanager:v0.15.0-rc.1
The kubernetes server is running on AWS EC2 instances with S3 object storage as Thanos storage.
I am trying to implement the thanos-store, however, there is a very strange issue with memory consumption there.
We started a prometheus server from scratch 7d ago. Each prometheus pod consumes on avarage 1G of memory:
When I start the thanos-store and include it in the thanos cluster, the memory consumption is rocket high for the thanos-store.
It starts from 0 and consumes almost all resources on the node and finishes with OEMKill:
After dedicating one 122G node only for thanos-store, I managed to fit the pot in. After the initial burst of memory, the memory eventually settles on about 60G and stays there:
I changed some options like
storage.tsdb.retention
which was 30d to 24h andchunk-pool-size
(from 2g to 512MB) but without any effect.Here is some detailed information about our configuration:
Store:
Prometheus:
Sidecar:
I cannot see anything important that can give us meaningful information in thanos-sidecar logs:
The sidecar logs are showing warnings, but I found out that this was due to the fact that prometheus took 1m to load. In the period which prometheus is loading, the sidecar cannot get the configuration from
api/v1/status/config
and issues this logs:Finally, the tsdb samples are not that many and therefore this overuse of memory is definitely not OK:
Any help would be appreciated!
I am very eager to make it work since it is a great tool!
The text was updated successfully, but these errors were encountered: