compact: Delete metrics in Object Storage #3529

apr1809 · 2020-12-02T12:43:40Z

I have deployed Thanos App Version: 0.16.0 with Prometheus 2.20.1 in EKS. I am using S3 for object storage.
I would like to know if any of the Thanos component can delete objects older than a particular date from S3. Else is it fine to delete objects by setting lifecycle rule in an S3 bucket? What is the recommended approach to delete Thanos data?

GiedriusS · 2020-12-02T22:15:30Z

Thanos Compact can apply retention rules based on the resolution. Please take a look at https://thanos.io/tip/components/compact.md/#enforcing-retention-of-data. Or, you can simply delete blocks where the max time is older than X. Third option is to use #3421 which lets you delete only certain metrics from a block. I hope that helps. Please close this if you have no more problems or please comment if you have some more sophisticated use-case.

apr1809 · 2020-12-03T10:44:11Z

@GiedriusS by delete blocks older than X, do you mean storage.tsdb.max-block-duration ?
I have not deployed the Compact component. Trying to implement Thanos mainly for HA of Prometheus instances.

apr1809 · 2020-12-09T14:57:47Z

@GiedriusS I have set the following arguments in compactor. But could still see data as old as 20 days still exist in S3. Can you let me know if my configuration is wrong.

args:
- compact
- --log.level=info
- --http-address=0.0.0.0:10902
- --data-dir=/data
- --retention.resolution-raw=14d
- --retention.resolution-5m=14d
- --retention.resolution-1h=14d
- --consistency-delay=30m
- --objstore.config-file=/conf/objstore.yml
- --log.format=json
- --delete-delay=0
- --wait

Also could you please clarify what "blocks where the max time is older than X" means? Is it safe to delete blocks older than max time set on the store component?

GiedriusS · 2020-12-09T20:34:58Z

Yes, that option controls the retention period on Prometheus. On Thanos Compact you have the --retention.resolution-* parameters as in your example. Has it finished at least one iteration? Retention is only being applied at the end ATM. You could check this in the metrics.

Let me clarify. The blocks encompass a time period - the period has a start and an end or, in other words, min/max times. If the max time of a block is older than $now-$retention_time then we are free to delete that block.

What do you mean by "max time set on the store component? --max-time` parameter? Either way, I hope this helps.

apr1809 · 2020-12-10T07:44:54Z

Thanks for the details @GiedriusS .
So, to confirm as per the example above, objects older than 14 days from now can be deleted from S3?

Also, I have set --delete-delay=0 . I believe this should delete the objects from S3 immediately. But I could find older objects still present in S3. Could you please clarify?

Danipiario · 2021-01-20T09:59:51Z

Hi all,
also for me. I have the same situation of @apr1809
I have setup all retention parameters at 7d, the compact works fine and the S3 buckets are marked for deletion, I have a delete delay of 48h but the S3 buckets are still present and consume space. Can I safetly delete the marked bucket?

Danipiario · 2021-02-02T22:00:23Z

Any news on this? I have to manually delete entries on S3 in order to save space.

tohjustin · 2021-03-04T04:50:38Z

Same here, I'm trying to figure out the recommended approach for deleting blocks on S3 that have exceeded the configured retention duration.

Here are some findings from my research at the moment:

Deleting blocks on S3 Bucket via Thanos Compactor's --retention.resolution-* flag
- Retention duration configured on Kubernetes side
- If you are already deploying Thanos Compactor instances to downsample your data, it would be simpler to also use it for deleting blocks
- Blocks on S3 will only get deleted if the Thanos Compactor pods are running
Deleting blocks on S3 Bucket via S3 Object Lifecycle Policy
- Retention duration configured on IaC side (CloudFormation/Terraform etc.) or manually via AWS Console/CLI
- If you don't need to downsample your data, using this approach allows avoid deploying a Thanos Compactor instance for each S3 bucket, which also mean
  - Cheaper, no need to pay for CPU, RAM, Network, Disk costs to run Thanos Compactor instances
  - Lesser k8s deployment(s) to worry about
  - Lesser IAM role(s) to manage, if you're creating a dedicate role for each Thanos Compactor instance that grants the permission to write + delete objects on a S3 bucket

Some stuff that needs further investigation:

Whether is there a difference in deleting TSDB blocks in S3 via a lifecycle policy? (i.e. By using this approach, whether the end state of the S3 bucket is exactly the same as if the blocks are deleted via the Thanos Compactor)
When AWS deletes S3 objects for us via a lifecycle policy, I would assume that it's based on the object's creation timestamp which is slightly different from the time interval of the metric data contained by the underlying TSDB block?
When the Thanos Compactor downsamples a TSDB block, whether it edits the corresponding S3 object or recreates it instead?
1. If it's the latter, then the this approach (deletion by lifecycle policies, that is based on the object's creation time) wouldn't be correct?

yeya24 · 2021-03-04T05:14:30Z

@tohjustin I can try to answer some questions you mentioned.

Whether is there a difference in deleting TSDB blocks in S3 via a lifecycle policy? (i.e. By using this approach, whether the end state of the S3 bucket is exactly the same as if the blocks are deleted via the Thanos Compactor)

As for deletion, I think the end state is the same. Once the configured retention is reached, the objects are deleted.

When AWS deletes S3 objects for us via a lifecycle policy, I would assume that it's based on the object's creation timestamp which is slightly different from the time interval of the metric data contained by the underlying TSDB block?

Yes, it is different. Thanos block retention is calculated based on the max sample timestamp in the block.

When the Thanos Compactor downsamples a TSDB block, whether it edits the corresponding S3 object or recreates it instead?
If it's the latter, then the this approach (deletion by lifecycle policies, that is based on the object's creation time) wouldn't be correct?

It recreates a new downsampled block. For correctness, it depends on how you define it. But yes, the time to delete the objects will change because we recreate the block.
Btw Thanos compactor supports configurable different retention policies for different downsample level blocks. So it is more flexible as we usually want to keep higher-downsample level blocks for a longer time.

Another thing I want to add is that Thanos compactor doesn't only do downsampling and data retention. It also performs compaction, which compacts several small blocks to a larger block. This makes long-time range queries more efficient. So I recommend you use the compactor if you are using S3.

tohjustin · 2021-03-04T05:37:39Z

@yeya24 Really appreciate the quick response! I totally forgot all about the compaction feature itself 🤦

I was sort of leaning towards the S3 lifecycle policy approach (lesser overhead) until reading your answers, I guess using Thanos compactor is the way to go 👍

stale · 2021-06-02T17:03:25Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-06-16T21:04:35Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

GiedriusS added the question label Dec 2, 2020

GiedriusS added the component: compact label Dec 2, 2020

kakkoyun changed the title ~~Delete metrics in Object Storage~~ compact: Delete metrics in Object Storage Feb 10, 2021

stale bot added the stale label Jun 2, 2021

stale bot closed this as completed Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compact: Delete metrics in Object Storage #3529

compact: Delete metrics in Object Storage #3529

apr1809 commented Dec 2, 2020

GiedriusS commented Dec 2, 2020 •

edited

Loading

apr1809 commented Dec 3, 2020

apr1809 commented Dec 9, 2020

GiedriusS commented Dec 9, 2020

apr1809 commented Dec 10, 2020

Danipiario commented Jan 20, 2021

Danipiario commented Feb 2, 2021

tohjustin commented Mar 4, 2021

yeya24 commented Mar 4, 2021

tohjustin commented Mar 4, 2021

stale bot commented Jun 2, 2021

stale bot commented Jun 16, 2021

compact: Delete metrics in Object Storage #3529

compact: Delete metrics in Object Storage #3529

Comments

apr1809 commented Dec 2, 2020

GiedriusS commented Dec 2, 2020 • edited Loading

apr1809 commented Dec 3, 2020

apr1809 commented Dec 9, 2020

GiedriusS commented Dec 9, 2020

apr1809 commented Dec 10, 2020

Danipiario commented Jan 20, 2021

Danipiario commented Feb 2, 2021

tohjustin commented Mar 4, 2021

yeya24 commented Mar 4, 2021

tohjustin commented Mar 4, 2021

stale bot commented Jun 2, 2021

stale bot commented Jun 16, 2021

GiedriusS commented Dec 2, 2020 •

edited

Loading