Implement multi-tenant Ruler: multitsdb and multiagent #5133

saswatamcode · 2022-02-07T10:48:29Z

Is your proposal related to a problem?

Currently, the Thanos Ruler has no built-in support for multi-tenancy like Receive. This creates issues when running it in a setup where we want to isolate tenants and store their rule-evaluated metrics in a different tsdb instance each. The only possible way might be using a Ruler for each tenant which is simpler but wasteful of resources.

Also, in the case of using Stateless Rulers, it's harder to achieve multi-tenancy, as different tenants might need different configurations while remote writing (write to separate locations with separate HTTP headers like THANOS-TENANT).

For example, consider a Receive with multiple tenants, to which a single Ruler might need to remote-write multi-tenant rule-based metrics and store it in the tenant's Receive tsdb. But in this case, the Ruler cannot add HTTP headers for each tenant, so it is treated as a completely new default tenant by Receive and new tsdb gets created.

(Note: This is a separate problem from ensuring that Ruler only selects data from one tenant while evaluating rules.)

Describe the solution you'd like

A potential solution would be using the Receive multitsdb in Ruler and having the same flags for tenancy as Receive (--receive.default-tenant_id, --receive.tenant-label-name). So the Ruler would be tenant-aware and store evaluated metrics in a different tsdb instance for each tenant using the tenant_id label to identify what rule-based series belongs to which tenant (assuming that the rule file configuration will specify the tenant label for each rule).

This can be extended to Stateless Ruler and allow separate remote write configs for each tenant. This would start an agent, i.e, a WAL-only storage for each tenant which remote-writes to only locations that were configured for that tenant. In essence, a multiagent package, would be needed to be able to handle this.

The addition of multitsdb to Ruler can also be skipped as the Scalable Rule proposal does mention the removal of embedded tsdb to be in the work plan! :)

Describe alternatives you've considered

Running a Ruler for each tenant.

Open to feedback and suggestions! If there are existing solutions/configuration options for achieving the same result which will be easier to implement than the above idea, that would be great too! 🙂

The text was updated successfully, but these errors were encountered:

matej-g · 2022-02-08T17:31:41Z

We discussed this briefly with @saswatamcode with one more suggested alternative from me, which would be to have a separate remote write config for each tenant, set the tenant header and use relabeling to only forward metrics which are applicable to that tenant. However, this is not really a systematic solution and require to always manually set up the remote write config for each tenant. The proposal solution seems reasonable to me 👍.

bwplotka · 2022-03-09T18:59:05Z

Hey, just trying to understand the main problem we are discussing here.

The only possible way might be using a Ruler for each tenant which is simpler but wasteful of resources.

Do we have any data on this? Because for stateless rulers there is not much baseline overhead for this situation. I would even say, the more problematic thing is the extreme situation where one tenant has too many rules and alerts for one ruler.

A potential solution would be using the Receive multitsdb in Ruler and having the same flags for tenancy as Receive

Do you mean sending things to Receive that uses multitsdb or literally using multitsdb code?

This would start an agent, i.e, a WAL-only storage for each tenant which remote-writes to only locations that were configured for that tenant. In essence, a multiagent package, would be needed to be able to handle this.

I would really avoid doing that - multi-tsdb is already a tough idea - every new TSDB has a lot of costs to be started and reloaded. Not sure if we want to replicate this idea for agent code.

Also, in the case of using Stateless Rulers, it's harder to achieve multi-tenancy, as different tenants might need different configurations while remote writing (write to separate locations with separate HTTP headers like THANOS-TENANT).

Right. We need essentially something like this:

I feel we should have multi-tenant rulers that can do any number of tenants rules (tenant agnostic) and we build tenancy with label aware sharding on receiver. Receive router already checks EACH series in write request and distribute with hashring - so why not checking tenant label there?

stale · 2022-06-12T17:56:30Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2022-08-13T07:14:05Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

yeya24 · 2022-10-24T06:49:59Z

Would love to see this moving forward.
A general sharder is really something we need in Thanos. Cortex has something similar like this using the Ring. In Thanos, we have the hashring only on the receiver side. However, if we want to distribute works like rules, compaction jobs, etc. We don't have a good way now.

saswatamcode · 2022-10-24T06:52:04Z

Yup! I'm writing a proposal + poc for this currently. Will land soon! 🙂

benjaminhuo · 2022-10-24T13:54:31Z

Yup! I'm writing a proposal + poc for this currently. Will land soon! 🙂

Looking forward to this feature!

anarcher · 2023-12-12T05:20:16Z

How is ruler sharing going? :-) As a cortex user, this feature was useful.

benjaminhuo · 2024-08-19T06:55:30Z

I feel we should have multi-tenant rulers that can do any number of tenants rules (tenant agnostic) and we build tenancy with label aware sharding on receiver. Receive router already checks EACH series in write request and distribute with hashring - so why not checking tenant label there?

Does #7256 already implement this feature? @bwplotka @GiedriusS

matej-g added feature request/improvement proposal component: rule labels Feb 8, 2022

stale bot added the stale label Jun 12, 2022

matej-g removed the stale label Jun 13, 2022

stale bot added the stale label Aug 13, 2022

matej-g added dont-go-stale Label for important issues which tells the stalebot not to close them and removed stale labels Aug 15, 2022

verejoel mentioned this issue Jan 19, 2024

Enable Receiver to extract Tenant from a label present in incoming timeseries #7081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multi-tenant Ruler: multitsdb and multiagent #5133

Implement multi-tenant Ruler: multitsdb and multiagent #5133

saswatamcode commented Feb 7, 2022 •

edited

Loading

matej-g commented Feb 8, 2022

bwplotka commented Mar 9, 2022

stale bot commented Jun 12, 2022

stale bot commented Aug 13, 2022

yeya24 commented Oct 24, 2022

saswatamcode commented Oct 24, 2022

benjaminhuo commented Oct 24, 2022 •

edited

Loading

anarcher commented Dec 12, 2023

benjaminhuo commented Aug 19, 2024

Implement multi-tenant Ruler: multitsdb and multiagent #5133

Implement multi-tenant Ruler: multitsdb and multiagent #5133

Comments

saswatamcode commented Feb 7, 2022 • edited Loading

Is your proposal related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

matej-g commented Feb 8, 2022

bwplotka commented Mar 9, 2022

stale bot commented Jun 12, 2022

stale bot commented Aug 13, 2022

yeya24 commented Oct 24, 2022

saswatamcode commented Oct 24, 2022

benjaminhuo commented Oct 24, 2022 • edited Loading

anarcher commented Dec 12, 2023

benjaminhuo commented Aug 19, 2024

saswatamcode commented Feb 7, 2022 •

edited

Loading

benjaminhuo commented Oct 24, 2022 •

edited

Loading