compaction: Be clear about storage retention and downsampling. #813

bwplotka · 2019-02-06T01:15:26Z

Currently in Thanos compactor, users can set retention for each resolution:

--retention.resolution-raw=...
--retention.resolution-5m=...
--retention.resolution-1h=...

If any is left 0 then it is unlimited.

It is natural for user to think that downsampling is something that magically zooms out the metrics and reduces size, so it's tempting to make raw retention super small (because it seems to be expensive) and rest super long as it is seems to be "just" cheap. However it's not that easy (as we can see for number of tickets related to this), so we need to document it better and make it easier to use.

Facts:

5m downsampled data are created only after 40h and there is rather not worth to make it earlier then that.
1h downsampled data are created only after 10days, and similar, it is not worth to make it earlier. 1h is created from 5m resolution.
Downsampled data DOES not need to be smaller really (!) This is because downsampling to stay accurate generates 5 series from single series. It reduces the samples, so for 15s scrape interval, 5m resolution has 20x less samples, 1h has 240x less samples than raw. Depending on scrape interval the gain is smaller or bigger. The problem is that if you have really high cardinally (like several millions), you end up with 5x several millions of series (without overheat from strings size though, as those 4 additional does not have labels). So you might reduce samples (which are highly compressible) in exchange for series (which are non compressible). It might happen that the size reduction is not that great as you expected, so the best is to check your downsampled block sizes on your own. However the whole point of downsampling is not really size. Is ability to cheaply draw accurate graph for long range queries like 1year.
Due to accuracy aggregations, special usage of downsampled data is recommended. We don't simply remove samples - that could mask meaningful info especially for gauges. This means that best effect you have with <>_over_time aggregations when using downsampled data.
It is always recommended to leave RAW data with similar retention as others, because it is super valuable to be able to quickly zoom in the past. With just downsampled data you lose that ability, as it is designed only for long range queries.

Additionally:
There is only rare use cases in using Thanos with object storage if you want just short retention (e.g 30d). Usually Prometheus 2.x is capable for really long retentions. I recently talked to person who has 5 year data retention with 2TB SSD and Prometheus deals with it just fine. Using Thanos with object storage is twice more complex than just Thanos sidecars (without uploading) and Thanos Queries for Global View and HA. So make your choice wisely. If you care about 30d of metrics, I really recommend to just use simpler form. Thanos object storage support is designed for longer retentions like years or even unlimited retention. The rare use cases is when you want to use object storage as backup options or you don't want to buy larger/medium SSD disks. (:

Let's dicuss how we can make whole thing easier to use.
Some early acceptance criteria would be:

Have minmum retention for each flag. So e.g raw min retention would be 40h, for 5m would be 10d etc.
Document those facts above.
Write more tests to be sure that this generation across different block sizes and different retentions works.

I will mark all issues/questions related to this a duplicate and forward users to discuss all here in single place.

The text was updated successfully, but these errors were encountered:

mcinquin · 2019-03-19T10:36:15Z

Hello,

To make sure I understand, if I set this configuration:
--retention.resolution-raw=30d
--retention.resolution-5m=90d
--retention.resolution-1h=365d

All my samples that are more than 30 days old will be deleted. I'll only have the raw data and never the downsampled data. So it's not possible to mix different downsampling retention. Am i right ?

For my case, the recommended configuration is:
--retention.resolution-raw=365d
--retention.resolution-5m=365d
--retention.resolution-1h=365d

Right ?

mcinquin · 2019-04-01T08:20:49Z

up ?

matejzero · 2019-04-05T07:47:34Z

If I understand downsampling correctly, the first case will:

delete raw metrics after 30 days
delete 5minute metrics after 90d
delete 1h metrics after 1 year

So you will be able to get raw metrics for queries for the last 30 days, anything older and you'll need to pull from downsampled data.

mcinquin · 2019-04-06T17:09:52Z

Hi @matejzero,

Thanks for your reply.
On the tests I did, I realized that with this configuration all data was deleted after 30 days.

matejzero · 2019-04-06T17:54:13Z

That should not happen. Only raw data should be deleted after 30d, 5min and 1h downsamples should still be available.

If you do the test again, check what thanos bucket inspect returns. It should show you that longer retention should still be available, because 5min downsamples are calculated after 40h and 1h downsamples are calculated after 10d.

raffraffraff · 2019-06-06T16:24:27Z

Just got bitten by this. Our configuration was:
--retention.resolution-raw=7d --retention.resolution-5m=14d --retention.resolution-1h=60d

I expected the compactor/downsampler to delete raw data after 7d, but that I would still have downsampled data for longer. We have NO DATA beyond 7d.

raffraffraff · 2019-06-06T16:55:32Z

The bucket inspect tells a different story. It tells me I have blocks with these attributes:

FROM: 20-05-2019
UNTIL: 30-05-2019
RANGE: 240h0m0s
SAMPLES: 1,079,894,691
CHUNKS: 10,790,261
COMP-FAILED: false
RESOLUTION: 5m0s
SOURCE: compactor

So how come my queries beyond raw retention have no data? I feel like I'm missing something really obvious here.

bwplotka · 2019-06-06T17:02:03Z

What are you querying for exactly @raffraffraff ? What Thanos version? We fixed one bug on auto downsampling option for querier in v0.5.0

raffraffraff · 2019-06-06T17:08:14Z

I'm looking at a dashboard that should go back 60 days (we started shipping data to S3 months ago but have been downsampling and retaining raw=7d, 5m=14d, 1h=60d)

Thanos Compactor 0.5.0-rc0
Thanos Store 0.5.0-rc0
Thanos Sidecars 0.4.0
Thanos Query 0.4.0

My dashboard ends at 7d. We've been using v0.5.0-rc0 since it came out, so if it fixed the issue, surely our data wouldn't evaporate exactly 7d ago?

bwplotka · 2019-06-06T17:11:43Z

You just said that data is there, but you cannot query it, right? #813 (comment)

raffraffraff · 2019-06-06T17:54:29Z

Basically, yes. Queries give back nothing after 7d but the bucket inspect indicates that there should be data in the buckets. I'm gonna upgrade the query instances to 0.5.0 and report back.

matejzero · 2019-06-06T18:16:12Z

What is your time range at the end of the query (rate(metric[timerange]))?

raffraffraff · 2019-06-06T18:18:28Z

Not sure I'm following you @matejzero . My simplest query is mem_used_percent{host=~"$host"} and I see it all the way back to 7 days ago.

Updating to Thanos Query 0.5.0-rc0 didn't help.

raffraffraff · 2019-06-07T18:24:32Z

And just to confirm, we are using --query.auto-downsampling in our queriers

raffraffraff · 2019-06-08T13:05:15Z

Interesting: in the Thanos Query interface, the 'Auto downsampling' option returns no data. I go back 2 weeks, and the graph is empty (no data points). I switch 'Max 5m downsampling' or 'Max 1hr downsampling' and I see data. WIth raw, I do not expect to see any because it has been expired. But I do expect 'Auto downsampling' to work.

raffraffraff · 2019-06-14T12:56:12Z

EDIT: Corrected a few things after further debug logging, reformatted for clarity.

The purpose of --query.auto-downsampling is to allow Thanos Query to return lower resolution data when huge time ranges are queried, but it should specify a "preferred" resolution instead of a strict maximum, which breaks certain queries. Example: auto-downsampling correctly chooses 'raw' resolution for a 1h span, but if that time range starts 10 weeks ago and the raw data expires after 4 weeks, Thanos Query UI throws "No data points found" instead of falling back to 5m data. It works fine the other way around - Thanos Query's getFor() will happily start off with a lower resolution (eg: 1h) and switch to a higher resolution (5m) if 1h downsampled data runs. But it should be able to do this in either direction.

Issues which I'm going to file separately:

Log entry MaximumResolution= means that opposite of what you would expect
API param max_source_resolution means that opposite of what you would expect
getFor() should be able to fall back to lower resolution data if no data is available the resolution given.

TL;DR: Right now, Thanos (all versions, including 0.5.0) downsampled data is ~unusable beyond 'raw' data retention, unless you are using the same retention on all three resolutions.

wogri · 2019-06-14T14:53:20Z

Is there a plan to fix this brokenness?

usu-github-bot · 2019-10-02T12:38:03Z

Any updates on this?

stale · 2020-01-11T05:42:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

romoy · 2020-03-17T22:38:10Z

I was just bit by this. Could there be an update of some form to the retention flags to warn of delete in object store?
or confirm backup of object store before attempting next to the flag https://github.com/thanos-io/thanos/blob/master/docs/components/compact.md#downsampling-resolution-and-retention ?

bwplotka · 2020-03-18T16:57:14Z

Ok this issue went sideways (: Let's keep it focused.

The retention and downsampling was explained in details here https://thanos.io/components/compact.md/#downsampling-resolution-and-retention
Sorry for not closing this issue (:

However, I could see some side requests/questions. Can we start another, separate issue for that?

Especially @romoy Definitely we can discuss to add this feature - we have already code for this, just we need to plumb it (: Can you please add an feature request issue with details What's the use case, what is missing, why you need this? (:

romoy · 2020-03-19T09:59:42Z

I opened #2290, could I get a review issue? Is it missing any info or a prompt a question?

rdzimmer-zz · 2020-12-09T14:56:55Z

@bwplotka Thanks for the explanation! I had a few clarification questions. Context is I am doing performance and scale testing and capacity planning. Particular interest is on understanding the impact of variable time series (like kubernetes containers or processes that are short lived). Apologies if these are already covered somewhere I haven't found yet.

This is because downsampling to stay accurate generates 5 series from single series. - Could you elaborate please on the "5"? I'm thinking it's something like a 5m and 1h count and sum (plus raw data makes 5) but haven't found that in the code yet.
The problem is that if you have really high cardinally (like several millions), you end up with 5x several millions of series (without overheat from strings size though, as those 4 additional does not have labels). - To clarify, the additional overhead is (primarily?) limited to the downsampled datapoints (ints/floats)? There is some linking/pointers that prevents the big time series strings from needing to be duplicated for the downsampled data series? This seems like a critical optimization, as in my testing I'm seeing the expected significant overhead of variable time series cardinality.
So you might reduce samples (which are highly compressible) in exchange for series (which are non compressible) - Sorry if this is too semantic, but trying to understand this is context of item 2 above. The 4 additional series (beyond raw) only contribute additional space in their downsampled datapoints, not from copies of the timeseries strings? Not sure I understand the (which are non compressible) part?

Thanks in advance!

rdzimmer-zz · 2020-12-09T19:11:20Z

https://github.com/thanos-io/thanos/blob/master/docs/components/compact.md

I think that helps explain the 5 series (times 2 buckets then for 5m and 1h, right?). I'm still looking into question 2 and 3.

I'll expand on the testing I'm doing and trying to understand the results from.
My testcase is uploading 1 million datapoints (samples) each minute at 1 minute granularity.
I have one test env with a static 1 million time series.
The second env has just under 1 million static time series, but also generates 2.88 million new time series every 2 hours (since the thanos receiver blocks data at 2 hour intervals). This represents things like new containers or processes or other things that are not static in a monitored env. The new time series are generated randomly from a possible 25 million total time series for my test. So after about a day you should have randomly generated all 25 million time series.

The Thanos Receivers (3 replicas, --receive.replication-factor=3, --tsdb.retention=4d) are pretty straight forward in their results. The memory usage is 5.7GiB vs 19.1GiB in the two environments. Again, each two hour period they would have 1M vs 3.88M time series and 120M datapoints. Based on my math and the differences, I estimate around 4700 bytes per time series and 7.8 bytes per datapoint.
For disk usage the growth is around 670MiB vs 982MiB every 2 hours, and levels off after 4 days at around 36GiB and 51GiB for the two envs. Based on those differences my math works out to around 110 bytes per time series and 4.6 bytes per datapoint. Note, these numbers are per receiver, so total for the setup is times 3 for HA.

The key takeaway from the receivers is there is a clear (and large) cost to time series vs datapoints, as expected. The other key point was that 4 days of storage is around 36GiB and 51GiB for the two test cases.

For the compaction analysis, I have not been running 10 days yet, so I do not have data for the 1h downsampling yet.
I am using Rook-Ceph storage for my bucket. After 4 days, the storage is up to 617GiB and 845GiB respectively, or 205GiB and 281GiB when you divide by 3 for the replication of 3.
I do have --delete-delay=48h and I'm still trying to understand the implications of that (need to learn how to see how much is pending deletion). I see after around 48 hours the pattern of growth changes.

I guess it just surprises me that the disk usage in the Receivers is so much less than in ceph so far, 36GiB vs 205GiB and 51GiB vs 281GiB (when adjusted for HA). I'm trying to figure out if the delete delay completely explains that, especially given how things do level off. For my next test I plan to run with a small --delete-delay=48h. Long term the goal is to be able to give very rough estimates for the disk usage based on inputs like number of time series and datapoints (with the understanding that label string length has a big impact).

Downsampling is not for saving space: thanos-io/thanos#813

bwplotka added the feature request/improvement label Feb 6, 2019

This was referenced Feb 6, 2019

Downsampling is not working but compaction seem to be fine #780

Closed

lost a downsampled data after compactor retention done #783

Closed

Compaction and Downsample potentially loose data with retention #585

Closed

PsychoSid mentioned this issue Feb 8, 2019

Compactor Retention Issue and Resource Usage #824

Closed

vladvasiliu mentioned this issue Jun 13, 2019

query: auto-downsampling causes inaccurate output of metrics (inflated values) #922

Closed

gjtempleton mentioned this issue Sep 4, 2019

[Docs] - Compact: Add line clarifying that its also responsib… #1493

Merged

bwplotka mentioned this issue Sep 20, 2019

Thanos store: "No block found" but thanos-validator shows blocks #1220

Closed

ivan-kiselev mentioned this issue Sep 20, 2019

Compact resolution/retention docs update. #1548

Merged

stale bot added the stale label Jan 11, 2020

stale bot closed this as completed Jan 18, 2020

GiedriusS mentioned this issue Jun 8, 2020

Gap in graph when enabling query.auto-downsampling #2737

Closed

soniasingla mentioned this issue Jul 15, 2020

Can I save disk space by reducing retention time? #2894

Closed

ryandawsonuk mentioned this issue Aug 26, 2020

using thanos for querying? openshift/cluster-monitoring-operator#911

Closed

rdzimmer-zz mentioned this issue Dec 10, 2020

Selective Metric Downsampling #3565

Closed

moadz mentioned this issue Jan 11, 2022

Compactor: Adding minimum retention flag validation for downsampling retention #5059

Merged

6 tasks

H3mul added a commit to H3mul/homelab-k3s-flux that referenced this issue Mar 23, 2023

Adjust(thanos): set better retention values

fb93b43

Downsampling is not for saving space: thanos-io/thanos#813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compaction: Be clear about storage retention and downsampling. #813

compaction: Be clear about storage retention and downsampling. #813

bwplotka commented Feb 6, 2019 •

edited

Loading

mcinquin commented Mar 19, 2019

mcinquin commented Apr 1, 2019

matejzero commented Apr 5, 2019

mcinquin commented Apr 6, 2019

matejzero commented Apr 6, 2019

raffraffraff commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

bwplotka commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

bwplotka commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

matejzero commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

raffraffraff commented Jun 7, 2019

raffraffraff commented Jun 8, 2019

raffraffraff commented Jun 14, 2019 •

edited

Loading

wogri commented Jun 14, 2019

usu-github-bot commented Oct 2, 2019

stale bot commented Jan 11, 2020

romoy commented Mar 17, 2020 •

edited

Loading

bwplotka commented Mar 18, 2020

romoy commented Mar 19, 2020

rdzimmer-zz commented Dec 9, 2020

rdzimmer-zz commented Dec 9, 2020

compaction: Be clear about storage retention and downsampling. #813

compaction: Be clear about storage retention and downsampling. #813

Comments

bwplotka commented Feb 6, 2019 • edited Loading

mcinquin commented Mar 19, 2019

mcinquin commented Apr 1, 2019

matejzero commented Apr 5, 2019

mcinquin commented Apr 6, 2019

matejzero commented Apr 6, 2019

raffraffraff commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

bwplotka commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

bwplotka commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

matejzero commented Jun 6, 2019

raffraffraff commented Jun 6, 2019

raffraffraff commented Jun 7, 2019

raffraffraff commented Jun 8, 2019

raffraffraff commented Jun 14, 2019 • edited Loading

wogri commented Jun 14, 2019

usu-github-bot commented Oct 2, 2019

stale bot commented Jan 11, 2020

romoy commented Mar 17, 2020 • edited Loading

bwplotka commented Mar 18, 2020

romoy commented Mar 19, 2020

rdzimmer-zz commented Dec 9, 2020

rdzimmer-zz commented Dec 9, 2020

bwplotka commented Feb 6, 2019 •

edited

Loading

raffraffraff commented Jun 14, 2019 •

edited

Loading

romoy commented Mar 17, 2020 •

edited

Loading