Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting retention per metric (e.g rule aggregation) #903

Open
bwplotka opened this issue Mar 11, 2019 · 59 comments
Open

Allow setting retention per metric (e.g rule aggregation) #903

bwplotka opened this issue Mar 11, 2019 · 59 comments

Comments

@bwplotka
Copy link
Member

In ideal world retention is not necessary on downsampling/raw leve, but on aggregation level.

We need to have the way to bring it through in LTS system like Thanos.

AC:

  • User can specify certain aggregation to be retained longer than others
  • Design doc (proposal is in place)

It cames to the fact that you want per metric retention ideally on compactor. This is bit related to delete_series as it might involve block rewrite in edge case... We need to design this.

Thoughts @improbable-ludwik @brancz @devnev @domgreen

@bwplotka bwplotka changed the title Per rule aggregation retention Allow setting retention per rule aggregation Mar 11, 2019
@brancz
Copy link
Member

brancz commented Mar 11, 2019

There have been discussions on per time-series retention on upstream Prometheus before, I think at least having a discussion with the team is worth it, just to see if there are any insights from back then.

proposal is in place

What do you mean by this? As far as I can tell there is no design written up anywhere, but may very well have missed it.

Off the top of my head, this could be a configuration which is a combination of a Prometheus style label selector, as well a respective rule of which resolution to keep for how long.

As a whole this is definitely not trivial, but I agree much needed.

@bwplotka
Copy link
Member Author

bwplotka commented Mar 14, 2019

Sorry - no, proposal has to be in place, that's what I meant.

(: Relabel-like config makes sense but essentially we are talking about rewrite in compactor for this right?

@brancz
Copy link
Member

brancz commented Mar 15, 2019

Configuration is a technicality, I'm not entirely sure relabelling would work exactly but something close to that probably yes. I agree the compactor is the component that needs to take care of this by re-writing blocks.

@ipstatic
Copy link
Contributor

With the federation system we have in place now, we have trained users that want metrics to be preserved (federated) have to use a specific recording rule style name so that the metrics would get federated. We would love to continue this practice with Thanos and only retain metrics that meet a specific format for an extended period of time.

@bwplotka
Copy link
Member Author

bwplotka commented Apr 1, 2019

Some offline discussions revealed that users still do it with Thanos, but using federation + Thanos on top.

The point was to shard ingestion and execute recording rules against each sharded ingestion gateway (which runs with minimal retention).  We have some very high-cardinality metrics and centralized ingestion alone was prohibitively expensive.  That prometheus federation layer allows us to compute/ingest aggregates (without retaining the raw metrics).

I think we should aim to allow users to avoid this, but cannot see immdiately blocker for that, otherwise than more complex system and query being able to fetch data with some lag (rule eval lag + federated scrape)

@smalldirector
Copy link

Just saw these discussion threads on potential feature requirement per metric retention on compactor, do we still look for this feature on compactor side?

Actually we are having same per metric retention requirement in our business scenario. We are trying to implement one policy based retention function to replace current compactor's retention function. Our idea is that providing one policy config file for retention function, the users can specify the promql expression to define the retention time for some metrics. The below is one sample policy config file:

policies:
  - expr: "{}"
    retentions:
      res-raw: 180d
      res-5m: 240d
      res-1h: 400d
  - expr: "{__name__=\"^go_memstats_.*\"}"
    retentions:
      res-raw: 90d
      res-5m: 180d
      res-1h: 360d
  - expr: "{__name__=\"go_memstats_gc_cpu_fraction\"}"
    retentions:
      res-raw: 200d
      res-5m: 300d
      res-1h: 400d

I would like to know your thoughts on this idea.

@wogri
Copy link

wogri commented Nov 6, 2019

Just an FYI - not retaining raw data will lead to problems (you won't be able to zoom into your metrics anymore): https://thanos.io/components/compact.md/#downsampling-resolution-and-retention:

In other words, if you set --retention.resolution-raw less then --retention.resolution-5m and --retention.resolution-1h - you might run into a problem of not being able to “zoom in” to your historical data.

@Reamer
Copy link

Reamer commented Nov 6, 2019

@wogri I think you are using grafana for visualization. Checkout PR grafana/grafana#19121. At the moment this PR is not in a grafana release, therefore I use the master.
I added three prometheus datasources, each with an other max_source_resolution parameter. Now I can switch to the best resolution and zoom into my metrics. I disabled also auto-downsampling in thanos query component, because it doesn't really work good.
If you are using rates you should make your interval flexible, because on resolution one hour you should set the timerange to two hours.
I think, I will write a thanos PR with some more explanation, if grafana is released with the above PR.

@wogri
Copy link

wogri commented Nov 6, 2019

Thanks @Reamer!

@smalldirector
Copy link

Just saw these discussion threads on potential feature requirement per metric retention on compactor, do we still look for this feature on compactor side?

Actually we are having same per metric retention requirement in our business scenario. We are trying to implement one policy based retention function to replace current compactor's retention function. Our idea is that providing one policy config file for retention function, the users can specify the promql expression to define the retention time for some metrics. The below is one sample policy config file:

policies:
  - expr: "{}"
    retentions:
      res-raw: 180d
      res-5m: 240d
      res-1h: 400d
  - expr: "{__name__=\"^go_memstats_.*\"}"
    retentions:
      res-raw: 90d
      res-5m: 180d
      res-1h: 360d
  - expr: "{__name__=\"go_memstats_gc_cpu_fraction\"}"
    retentions:
      res-raw: 200d
      res-5m: 300d
      res-1h: 400d

I would like to know your thoughts on this idea.

@bwplotka any thoughts on this idea to support per metric retention on compactor?

@bwplotka bwplotka changed the title Allow setting retention per rule aggregation Allow setting retention per metric (e.g rule aggregation) Dec 10, 2019
@bwplotka
Copy link
Member Author

Extra context can be found here: prometheus/prometheus#1381

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@brancz
Copy link
Member

brancz commented Jan 13, 2020

This is being worked on on prometheus, once done there I would say we can implement it here with the same semantics and configuration.

@stale stale bot removed the stale label Jan 13, 2020
@bwplotka bwplotka removed the pinned label Jan 28, 2020
@bwplotka
Copy link
Member Author

The current plan is to first tackle #1598 then try to implement this. Putting this issue as GSoC project as well.

This is being worked on prometheus, once done there I would say we can implement it here with the same semantics and configuration.

The current decision is that Prometheus will not implement this, and the work has to be done first externally. It would be nice though if our work would work could be reused for vanilla Prometheus as well (as usual).

@Jigar3
Copy link

Jigar3 commented Jan 29, 2020

Hey @bwplotka, I would like to work on this and it would be very helpful if you could suggest me some resources to get started.

@stale
Copy link

stale bot commented Feb 28, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Feb 28, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Feb 28, 2020 via email

@stale stale bot removed the stale label Feb 28, 2020
@stale
Copy link

stale bot commented Jul 10, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 10, 2021
@doug-ba
Copy link

doug-ba commented Jul 10, 2021

I’m still interested in this feature.

@stale stale bot removed the stale label Jul 10, 2021
@yeya24
Copy link
Contributor

yeya24 commented Jul 13, 2021

I am still interested in this as well. Now with the new bucket rewrite tool and the compactv2 library, this is already doable I think.

@stale
Copy link

stale bot commented Sep 19, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Sep 19, 2021
@markmsmith
Copy link

Still needed.

@stale stale bot removed the stale label Sep 19, 2021
@stale
Copy link

stale bot commented Jan 9, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 9, 2022
@doug-ba
Copy link

doug-ba commented Jan 9, 2022

Still needed

@stale stale bot removed the stale label Jan 9, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 17, 2022
@markmsmith
Copy link

Still needed.

@stale
Copy link

stale bot commented Sep 21, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Sep 21, 2022
@markmsmith
Copy link

Still needed :(

@DanielCastronovo
Copy link

Still needed

@aamargant
Copy link

Any news about this feature? @bwplotka @csmarchbanks

@kamelohr
Copy link

kamelohr commented Nov 9, 2023

Chiming in here, as I'm also highly in need for this feature for a project

@NotAFile
Copy link
Contributor

NotAFile commented Apr 8, 2024

Does the narrower "multi-tenant compactor", or more generally matching only on external labels fall under this too?

As I understand it, while similar it would require a less invasive changes, as each tenant has it's own separate TSDB and blocks. So with that it would be possible to only take the external labels in ThanosMeta and look up a deletion policy based on that, right?

I was wondering whether this was actually already possible but as I understand the docs there seems to be no way to only compact specific external labels right now. There is, the documentation is just a bit confusing. However, you still need one compact instance per retention policy.

@jahknem
Copy link

jahknem commented May 13, 2024

Still needed. I guess the work that was to be started as part of GSoC has not been finished? Is there maybe some branch or fork in which work on this has begun?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests