Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run compactor in single binary (-target=all) and change compactor default config to split-and-merge compaction strategy #814

Closed
Tracked by #827
pracucci opened this issue Jan 19, 2022 · 16 comments
Assignees

Comments

@pracucci
Copy link
Collaborator

pracucci commented Jan 19, 2022

The compactor currently doesn't run in single binary mode (-target=all). We should do it to provide a fully working cluster when using the single binary.

This will also unblock #94.

Switch to split-and-merge compaction strategy by default

The split-and-merge compaction strategy do NOT require sharding/ring to enabled but it's required if you run more than 1 single binary replicas (likewise, compactor sharding/ring is required to be enabled when you run in microservices mode and have multiple compactor replicas).

I propose the following changes:

  1. Change -compactor.compaction-strategy default config from default to split-and-merge
  2. Add a warning log at compactor startup displayed if sharding is disabled (the log should warn about not running more than 1 compactor replica or single binary with sharding disabled)
  3. Add compactor to -target=all (used by single binary mode) (issue)
  4. Update single binary doc and examples to include compactor ring setup and sharding

I also propose the following config changes:

  • [not feeling strong] -compactor.split-groups default value from 4 to 1 (a value of 1 is fine for up to 10M active series which is way beyond what we target in the default config)
  • -compactor.sharding-enabled: need to fix the CLI flag description (see: "Shard tenants across multiple compactor instances" but now we shard "workload" not just "tenants")
  • Backport our compactor config to jsonnet
  • Ensure integration tests run with split-and-merge compactor
  • Ensure compactor doc is updated
@replay
Copy link
Contributor

replay commented Jan 19, 2022

Afaik this is now possible because we have the horizontally scalable compactor, previously it would have been problematic if all loaded the compactor and then a user would have built a cluster of multiple single binaries, correct?

@pracucci
Copy link
Collaborator Author

Afaik this is now possible because we have the horizontally scalable compactor, previously it would have been problematic if all loaded the compactor and then a user would have built a cluster of multiple single binaries, correct?

Yes, that should be correct (I haven't done a deep design of this work yet).

@pracucci pracucci changed the title Run compactor in single binary (-target=all) Run compactor in single binary (-target=all) and change compactor default config to split-and-merge compaction strategy Jan 19, 2022
@replay
Copy link
Contributor

replay commented Jan 19, 2022

  1. Change -compactor.compaction-strategy default config from default to split-and-merge

Agree, then we should probably rename the strategy named default to something more descriptive.

What would happen if a users starts multiple compactor instances with sharding disabled? Will they just perform the same work multiple times, or is there potential that this leads to corrupt data?

Usually I would be concerned about users who are already running clusters of single binaries and who then update to this version without reading the CHANGELOG, they might not realize that now the single binary is also loading the compactor. However, since Mimir is currently not open yet this is not a concern.

@09jvilla
Copy link
Contributor

Agreed on needing to rename default since its now no longer the default. Can we actually just get rid of whatever default was altogether? Is there any reason a user would use it? Then they don't have to choose a compaction strategy and we can do away with that option.

Just went through a quick look at our compactor docs to try to sanity check things:
https://grafana.com/docs/metrics-enterprise/latest/config/#compactor

A few questions:

  1. Any changes needed to defaults for compactor_split_and_merge_shards ? or is that the split-shards that you're referring to above?
  2. Any changes needed to defaults for compactor_tenant_shard_size and compaction_concurrency?
  3. From a GEM perspective, does this mean we can get rid of the sharding_strategy flag since we no longer want folks to use time-sharding? cc @replay

@johannaratliff
Copy link
Contributor

Add a warning log at compactor startup displayed if sharding is disabled (the log should warn about not running more than 1 compactor replica or single binary with sharding disabled)

What is the consequence if this warning log is ignored and it is configured with sharding disabled with multiple compactor replicas? (Trying to ascertain whether more than a warning would be helpful)

@pstibrany
Copy link
Member

Can we actually just get rid of whatever default was altogether? Is there any reason a user would use it?

I am in favor of removing old compaction, and only keeping split&merge compaction. When configured with no splitting, it works in the same way, but is also horizontally scalable.

Just went through a quick look at our compactor docs to try to sanity check things: https://grafana.com/docs/metrics-enterprise/latest/config/#compactor

A few questions:

  1. Any changes needed to defaults for compactor_split_and_merge_shards ? or is that the split-shards that you're referring to above?

I don't see "split count" stage explained in that documentation. It is a way to divide input blocks into smaller groups that are then split at the same time. I think -compactor.split-groups=1 is a good default value.

  1. Any changes needed to defaults for compactor_tenant_shard_size and compaction_concurrency?

compactor_tenant_shard_size defaults to 1. This is good enough for small deployments. (Up to 30M of active series)

compaction_concurrency defaults to 1 as well. Our preference is to run more compactors than to increase this value. Perhaps we should remove it completely?

@pstibrany
Copy link
Member

pstibrany commented Jan 20, 2022

I propose the following changes:

  1. Change -compactor.compaction-strategy default config from default to split-and-merge

I agree with @09jvilla that we should remove default completely. Split&merge with no splitting works the same.

  1. Add a warning log at compactor startup displayed if sharding is disabled (the log should warn about not running more than 1 compactor replica or single binary with sharding disabled)

What if we made sharding mandatory? It's a bit of extra setup, but we can provide default config files for single-binary using memberlist, so when people run more instances, they get horizontal scalability for free? (For ingesters we already make sharding mandatory)

  1. Add compactor to -target=all (used by single binary mode) (issue)
  2. Update single binary doc and examples to include compactor ring setup and sharding

I also propose the following config changes:

  • [not feeling strong] -compactor.split-groups default value from 4 to 1 (a value of 1 is fine for up to 10M active series which is way beyond what we target in the default config)
  • -compactor.sharding-enabled: need to fix the CLI flag description (see: "Shard tenants across multiple compactor instances" but now we shard "workload" not just "tenants")
  • Backport our compactor config to jsonnet
  • Ensure integration tests run with split-and-merge compactor
  • Ensure compactor doc is updated

👍 for all

@pracucci
Copy link
Collaborator Author

What would happen if a users starts multiple compactor instances with sharding disabled? Will they just perform the same work multiple times, or is there potential that this leads to corrupt data?
What is the consequence if this warning log is ignored and it is configured with sharding disabled with multiple compactor replicas? (Trying to ascertain whether more than a warning would be helpful)

We don't expect corrupted blocks, but compactors doing extra work. If you have sharding disabled and N single binary replicas, you may end up doing the compaction work N extra times.

Side note: in my opinion running compactor in single binary is not a great idea (well the whole single binary thing) for any production cluster because you're going to have 1 replica with higher load due to compaction and this could impact write and read path, but I understand we want the compactor in single binary to have a good "getting started" story.

compaction_concurrency defaults to 1 as well. Our preference is to run more compactors than to increase this value. Perhaps we should remove it completely?

I think we've still seen it useful for the case a compactor owns multiple jobs. With compaction_concurrency you add concurrency to such jobs which, otherwise, would be serialized.

Changes to plan

According to the discussion about, I propose the following changes to the initial plan:

  • Get rid of -compactor.compaction-strategy, only support split-and-merge compactor
  • Document -compactor.split-groups

In-discussion

Items still in discussion for which I would like others opinion too:

  • Require sharding in compactor (so requiring ring config)
  • Anything else?

@pstibrany
Copy link
Member

pstibrany commented Jan 20, 2022

Get rid of -compactor.compaction-strategy, only support split-and-merge compactor

Going to work on this:

@replay
Copy link
Contributor

replay commented Jan 20, 2022

  • Require sharding in compactor (so requiring ring config)

I think requiring the ring config would be preferable over accepting that the single-binary instances might do duplicate compaction work.
Especially because:

(For ingesters we already make sharding mandatory)

We could maybe make this a bit easier by switching the default ring store from consul to memberlist for all rings, that would result in one less flag that needs to be configured when scaling the single binary, assuming that the super basic use-case users won't use Consul. It looks like we're moving in that direction anyway (in our cloud as well).

@pracucci
Copy link
Collaborator Author

We could maybe make this a bit easier by switching the default ring store from consul to memberlist for all rings

I'm up to it, but memberlist still requires some config.

@pstibrany
Copy link
Member

I'm up to it, but memberlist still requires some config.

Yes, it does, but not much, and it's also reduced by the fact that there is only one global memberlist client to configure.

@pracucci
Copy link
Collaborator Author

pracucci commented Jan 24, 2022

FYI, @pstibrany is working on the whole task.

@pstibrany
Copy link
Member

pstibrany commented Jan 24, 2022

I think requiring the ring config would be preferable over accepting that the single-binary instances might do duplicate compaction work.

Created issue to discuss this: #856

@pstibrany
Copy link
Member

pstibrany commented Jan 24, 2022

Current tasks:

@pstibrany
Copy link
Member

This is now finished. There may be additional changes to documentation based on post-merge feedback in #866.

Work on removing non-sharded compactor is moved to #856, and further config simplification to #857.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants