-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compaction: Be clear about storage retention and downsampling. #813
Comments
Hello, To make sure I understand, if I set this configuration: All my samples that are more than 30 days old will be deleted. I'll only have the raw data and never the downsampled data. So it's not possible to mix different downsampling retention. Am i right ? For my case, the recommended configuration is: Right ? |
up ? |
If I understand downsampling correctly, the first case will:
So you will be able to get raw metrics for queries for the last 30 days, anything older and you'll need to pull from downsampled data. |
Hi @matejzero, Thanks for your reply. |
That should not happen. Only raw data should be deleted after 30d, 5min and 1h downsamples should still be available. If you do the test again, check what |
Just got bitten by this. Our configuration was: I expected the compactor/downsampler to delete raw data after 7d, but that I would still have downsampled data for longer. We have NO DATA beyond 7d. |
The bucket inspect tells a different story. It tells me I have blocks with these attributes:
So how come my queries beyond raw retention have no data? I feel like I'm missing something really obvious here. |
What are you querying for exactly @raffraffraff ? What Thanos version? We fixed one bug on auto downsampling option for querier in |
I'm looking at a dashboard that should go back 60 days (we started shipping data to S3 months ago but have been downsampling and retaining raw=7d, 5m=14d, 1h=60d) Thanos Compactor 0.5.0-rc0 My dashboard ends at 7d. We've been using v0.5.0-rc0 since it came out, so if it fixed the issue, surely our data wouldn't evaporate exactly 7d ago? |
You just said that data is there, but you cannot query it, right? #813 (comment) |
Basically, yes. Queries give back nothing after 7d but the bucket inspect indicates that there should be data in the buckets. I'm gonna upgrade the query instances to 0.5.0 and report back. |
What is your time range at the end of the query (rate(metric[timerange]))? |
Not sure I'm following you @matejzero . My simplest query is Updating to Thanos Query 0.5.0-rc0 didn't help. |
And just to confirm, we are using |
Interesting: in the Thanos Query interface, the 'Auto downsampling' option returns no data. I go back 2 weeks, and the graph is empty (no data points). I switch 'Max 5m downsampling' or 'Max 1hr downsampling' and I see data. WIth raw, I do not expect to see any because it has been expired. But I do expect 'Auto downsampling' to work. |
EDIT: Corrected a few things after further debug logging, reformatted for clarity. The purpose of Issues which I'm going to file separately:
TL;DR: Right now, Thanos (all versions, including 0.5.0) downsampled data is ~unusable beyond 'raw' data retention, unless you are using the same retention on all three resolutions. |
Is there a plan to fix this brokenness? |
Any updates on this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I was just bit by this. Could there be an update of some form to the retention flags to warn of delete in object store? |
Ok this issue went sideways (: Let's keep it focused. The retention and downsampling was explained in details here https://thanos.io/components/compact.md/#downsampling-resolution-and-retention However, I could see some side requests/questions. Can we start another, separate issue for that? Especially @romoy Definitely we can discuss to add this feature - we have already code for this, just we need to plumb it (: Can you please add an feature request issue with details What's the use case, what is missing, why you need this? (: |
I opened #2290, could I get a review issue? Is it missing any info or a prompt a question? |
@bwplotka Thanks for the explanation! I had a few clarification questions. Context is I am doing performance and scale testing and capacity planning. Particular interest is on understanding the impact of variable time series (like kubernetes containers or processes that are short lived). Apologies if these are already covered somewhere I haven't found yet.
Thanks in advance! |
https://github.com/thanos-io/thanos/blob/master/docs/components/compact.md I'll expand on the testing I'm doing and trying to understand the results from. The Thanos Receivers (3 replicas, The key takeaway from the receivers is there is a clear (and large) cost to time series vs datapoints, as expected. The other key point was that 4 days of storage is around 36GiB and 51GiB for the two test cases. For the compaction analysis, I have not been running 10 days yet, so I do not have data for the 1h downsampling yet. |
Downsampling is not for saving space: thanos-io/thanos#813
Currently in Thanos compactor, users can set retention for each resolution:
If any is left
0
then it is unlimited.It is natural for user to think that downsampling is something that magically zooms out the metrics and reduces size, so it's tempting to make raw retention super small (because it seems to be expensive) and rest super long as it is seems to be "just" cheap. However it's not that easy (as we can see for number of tickets related to this), so we need to document it better and make it easier to use.
Facts:
<>_over_time
aggregations when using downsampled data.Additionally:
There is only rare use cases in using Thanos with object storage if you want just short retention (e.g 30d). Usually Prometheus 2.x is capable for really long retentions. I recently talked to person who has 5 year data retention with 2TB SSD and Prometheus deals with it just fine. Using Thanos with object storage is twice more complex than just Thanos sidecars (without uploading) and Thanos Queries for Global View and HA. So make your choice wisely. If you care about 30d of metrics, I really recommend to just use simpler form. Thanos object storage support is designed for longer retentions like years or even unlimited retention. The rare use cases is when you want to use object storage as backup options or you don't want to buy larger/medium SSD disks. (:
Let's dicuss how we can make whole thing easier to use.
Some early acceptance criteria would be:
I will mark all issues/questions related to this a
duplicate
and forward users to discuss all here in single place.The text was updated successfully, but these errors were encountered: