Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: Delete metrics in Object Storage #3529

Closed
apr1809 opened this issue Dec 2, 2020 · 12 comments
Closed

compact: Delete metrics in Object Storage #3529

apr1809 opened this issue Dec 2, 2020 · 12 comments

Comments

@apr1809
Copy link

apr1809 commented Dec 2, 2020

I have deployed Thanos App Version: 0.16.0 with Prometheus 2.20.1 in EKS. I am using S3 for object storage.
I would like to know if any of the Thanos component can delete objects older than a particular date from S3. Else is it fine to delete objects by setting lifecycle rule in an S3 bucket? What is the recommended approach to delete Thanos data?

@GiedriusS
Copy link
Member

GiedriusS commented Dec 2, 2020

Thanos Compact can apply retention rules based on the resolution. Please take a look at https://thanos.io/tip/components/compact.md/#enforcing-retention-of-data. Or, you can simply delete blocks where the max time is older than X. Third option is to use #3421 which lets you delete only certain metrics from a block. I hope that helps. Please close this if you have no more problems or please comment if you have some more sophisticated use-case.

@apr1809
Copy link
Author

apr1809 commented Dec 3, 2020

@GiedriusS by delete blocks older than X, do you mean storage.tsdb.max-block-duration ?
I have not deployed the Compact component. Trying to implement Thanos mainly for HA of Prometheus instances.

@apr1809
Copy link
Author

apr1809 commented Dec 9, 2020

@GiedriusS I have set the following arguments in compactor. But could still see data as old as 20 days still exist in S3. Can you let me know if my configuration is wrong.

  • args:
    • compact
    • --log.level=info
    • --http-address=0.0.0.0:10902
    • --data-dir=/data
    • --retention.resolution-raw=14d
    • --retention.resolution-5m=14d
    • --retention.resolution-1h=14d
    • --consistency-delay=30m
    • --objstore.config-file=/conf/objstore.yml
    • --log.format=json
    • --delete-delay=0
    • --wait

Also could you please clarify what "blocks where the max time is older than X" means? Is it safe to delete blocks older than max time set on the store component?

@GiedriusS
Copy link
Member

Yes, that option controls the retention period on Prometheus. On Thanos Compact you have the --retention.resolution-* parameters as in your example. Has it finished at least one iteration? Retention is only being applied at the end ATM. You could check this in the metrics.

Let me clarify. The blocks encompass a time period - the period has a start and an end or, in other words, min/max times. If the max time of a block is older than $now-$retention_time then we are free to delete that block.

What do you mean by "max time set on the store component? --max-time` parameter? Either way, I hope this helps.

@apr1809
Copy link
Author

apr1809 commented Dec 10, 2020

Thanks for the details @GiedriusS .
So, to confirm as per the example above, objects older than 14 days from now can be deleted from S3?

Also, I have set --delete-delay=0 . I believe this should delete the objects from S3 immediately. But I could find older objects still present in S3. Could you please clarify?

@Danipiario
Copy link

Hi all,
also for me. I have the same situation of @apr1809
I have setup all retention parameters at 7d, the compact works fine and the S3 buckets are marked for deletion, I have a delete delay of 48h but the S3 buckets are still present and consume space. Can I safetly delete the marked bucket?

@Danipiario
Copy link

Any news on this? I have to manually delete entries on S3 in order to save space.

@kakkoyun kakkoyun changed the title Delete metrics in Object Storage compact: Delete metrics in Object Storage Feb 10, 2021
@tohjustin
Copy link

Same here, I'm trying to figure out the recommended approach for deleting blocks on S3 that have exceeded the configured retention duration.

Here are some findings from my research at the moment:

  1. Deleting blocks on S3 Bucket via Thanos Compactor's --retention.resolution-* flag

    • Retention duration configured on Kubernetes side
    • If you are already deploying Thanos Compactor instances to downsample your data, it would be simpler to also use it for deleting blocks
    • Blocks on S3 will only get deleted if the Thanos Compactor pods are running
  2. Deleting blocks on S3 Bucket via S3 Object Lifecycle Policy

    • Retention duration configured on IaC side (CloudFormation/Terraform etc.) or manually via AWS Console/CLI
    • If you don't need to downsample your data, using this approach allows avoid deploying a Thanos Compactor instance for each S3 bucket, which also mean
      • Cheaper, no need to pay for CPU, RAM, Network, Disk costs to run Thanos Compactor instances
      • Lesser k8s deployment(s) to worry about
      • Lesser IAM role(s) to manage, if you're creating a dedicate role for each Thanos Compactor instance that grants the permission to write + delete objects on a S3 bucket

Some stuff that needs further investigation:

  1. Whether is there a difference in deleting TSDB blocks in S3 via a lifecycle policy? (i.e. By using this approach, whether the end state of the S3 bucket is exactly the same as if the blocks are deleted via the Thanos Compactor)
  2. When AWS deletes S3 objects for us via a lifecycle policy, I would assume that it's based on the object's creation timestamp which is slightly different from the time interval of the metric data contained by the underlying TSDB block?
  3. When the Thanos Compactor downsamples a TSDB block, whether it edits the corresponding S3 object or recreates it instead?
    1. If it's the latter, then the this approach (deletion by lifecycle policies, that is based on the object's creation time) wouldn't be correct?

@yeya24
Copy link
Contributor

yeya24 commented Mar 4, 2021

@tohjustin I can try to answer some questions you mentioned.

Whether is there a difference in deleting TSDB blocks in S3 via a lifecycle policy? (i.e. By using this approach, whether the end state of the S3 bucket is exactly the same as if the blocks are deleted via the Thanos Compactor)

As for deletion, I think the end state is the same. Once the configured retention is reached, the objects are deleted.

When AWS deletes S3 objects for us via a lifecycle policy, I would assume that it's based on the object's creation timestamp which is slightly different from the time interval of the metric data contained by the underlying TSDB block?

Yes, it is different. Thanos block retention is calculated based on the max sample timestamp in the block.

When the Thanos Compactor downsamples a TSDB block, whether it edits the corresponding S3 object or recreates it instead?
If it's the latter, then the this approach (deletion by lifecycle policies, that is based on the object's creation time) wouldn't be correct?

It recreates a new downsampled block. For correctness, it depends on how you define it. But yes, the time to delete the objects will change because we recreate the block.
Btw Thanos compactor supports configurable different retention policies for different downsample level blocks. So it is more flexible as we usually want to keep higher-downsample level blocks for a longer time.

Another thing I want to add is that Thanos compactor doesn't only do downsampling and data retention. It also performs compaction, which compacts several small blocks to a larger block. This makes long-time range queries more efficient. So I recommend you use the compactor if you are using S3.

@tohjustin
Copy link

@yeya24 Really appreciate the quick response! I totally forgot all about the compaction feature itself 🤦

I was sort of leaning towards the S3 lifecycle policy approach (lesser overhead) until reading your answers, I guess using Thanos compactor is the way to go 👍

@stale
Copy link

stale bot commented Jun 2, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 2, 2021
@stale
Copy link

stale bot commented Jun 16, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants