-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
settings,spanconfig: introduce a protobuf setting type #92466
settings,spanconfig: introduce a protobuf setting type #92466
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new settings type sounds like it stores a protobuf, but it looks like it stores JSON, which is problematic, since proto field names can change without affecting the proto encoding but this will break JSON compat. So this really is a JSON field in disguise?
Actually storing protobuf seems better from that POV, however that's pretty garbled when looking at the table directly (not sure how much this matters).
In any case, since this is primarily intended for a backport, why not use a more surgical approach, for instance injecting the fallback via an env var (JSON), or using a string setting that we parse as JSON? Which is almost what you have here except for the protobuf name.
Personally I would skew heavily towards the simplest backport here.
6653300
to
cd872c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new settings type sounds like it stores a protobuf, but it looks like it stores JSON, which is problematic, since proto field names can change without affecting the proto encoding but this will break JSON compat.
Completely forgot about the name thing. Changed, now:
- what's stored in the table is the byte form of the protobuf itself,
- is held in memory is the protoutil.Message,
- rendered/accepted as input is the JSON representation.
garbled when looking at the table directly
Things are already garbled for the cluster version setting, this is no worse. It's not garbled if using SHOW CLUSTER SETTING <setting name>
:
root@localhost:26257/defaultdb> show cluster setting spanconfig.store.fallback_config_override;
spanconfig.store.fallback_config_override
--------------------------------------------------------------------------------------------------------------------
{"gcPolicy": {"ttlSeconds": 3600}, "numReplicas": 5, "rangeMaxBytes": "536870912", "rangeMinBytes": "134217728"}
(1 row)
why not use a more surgical approach,
I'm hoping the protobuf setting type is useful beyond this PR, I can imagine it being so now that the legwork is done. Since this fallback config is a per-node one, using an envvar would require a full cluster restart, which can be painful given how easy it is to alter RANGE DEFAULT. Hopefully this is still small enough to backport.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @arulajmani)
cd872c1
to
619de82
Compare
For this setting type: - the protoutil.Message is held in memory, - the byte representation is stored in system.settings, and - the json representation is used when accepting input and rendering state (through SHOW CLUSTER SETTING <setting-name>, the raw form is visible when looking directly at system.settings) We also use this setting type to support power a spanconfig.store.fallback_config_override, which overrides the fallback config used for ranges with no explicit span configs set. Previously we used a hardcoded value -- this makes it a bit more configurable. This is a partial and backportable workaround (read: hack) for cockroachdb#91238 and \cockroachdb#91239. In an internal escalation we observed "orphaned" ranges from dropped tables that were not being being referenced by span configs (by virtue of them originating from now-dropped tables/configs). Typically ranges of this sort are short-lived, they're emptied out through GC and merged into LHS neighbors. But if the neighboring ranges are large enough, or load just high enough, the merge does not occur. For such orphaned ranges we were using a hardcoded "fallback config", with a replication factor of three. This made for confusing semantics where if RANGE DEFAULT was configured to have a replication factor of five, our replication reports indicated there were under-replicated ranges. This is partly because replication reports today are powered by zone configs (thus looking at RANGE DEFAULT) -- this will change shortly as part of \cockroachdb#89987 where we'll instead consider span config data. In any case, we were warning users of under-replicated ranges but within KV we were not taking any action to upreplicate them -- KV was respecting the hard-coded fallback config. The issues above describe that we should apply each tenant's RANGE DEFAULT config to all such orphaned ranges, which is probably the right fix. This was alluded to in an earlier TODO but is still left for future work. // TODO(irfansharif): We're using a static[1] fallback span config // here, we could instead have this directly track the host tenant's // RANGE DEFAULT config, or go a step further and use the tenant's own // RANGE DEFAULT instead if the key is within the tenant's keyspace. // We'd have to thread that through the KVAccessor interface by // reserving special keys for these default configs. // // [1]: Modulo the private spanconfig.store.fallback_config_override, which // applies globally. So this PR instead takes a shortcut -- it makes the static config configurable through a cluster setting. We can now do the following which alters what fallback config is applied to orphaned ranges, and in our example above, force such ranges to also have a replication factor of five. SET CLUSTER SETTING spanconfig.store.fallback_config_override = ' { "gcPolicy": {"ttlSeconds": 3600}, "numReplicas": 5, "rangeMaxBytes": "536870912", "rangeMinBytes": "134217728" }'; Release note: None
619de82
to
b0cfdc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is useful, but I'm not sure how to square this with our backport policy. We try to introduce minimal changes and adding a whole new settings type doesn't check that box.
What happens in a mixed cluster in which some nodes have picked up the patch and others didn't? We have never created a new settings type in a patch release. It looks safe from the code that I've seen but there is still risk that isn't matched by an important benefit.
Unfortunately I'm going to have to continue disagreeing that this is backport material. It's great for master though! Let's trim down the backport when it comes. I agree that having to do a restart (to set the env var) isn't great but also for that particular customer, they can set the env var and pick it up when they are migrating into the patch release so it's free. The combination of "global nonstandard replication" and "cares about unreferenced ranges' replication factor" is likely small.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @arulajmani)
Some env-var-only version of this for a backport sounds good to me. bors r+ |
Build failed: |
CI failure is unrelated, some DNS thing (internal slack thread). Will re-bors once resolved. |
bors r+ |
Build succeeded: |
It controls what replication factor is used for ranges with no explicit span configs set. This is a backportable form of the spanconfig.store.fallback_config_override we added in cockroachdb#92466. Release note: None
It controls what replication factor is used for ranges with no explicit span configs set. This is a backportable form of the spanconfig.store.fallback_config_override we added in cockroachdb#92466. Release note: None
For this setting type:
protoutil.Message
is held in memory,system.settings
, andSHOW CLUSTER SETTING <setting-name>
, the raw form is visible when looking directly atsystem.settings
)We also use this setting type to support power a
spanconfig.store.fallback_config_override
, which overrides the fallback config used for ranges with no explicit span configs set. Previously we used a hardcoded value -- this makes it a bit more configurable. This is a partial and backportable workaround (read: hack) for #91238 and #91239. In an internal escalation we observed "orphaned" ranges from dropped tables that were not being being referenced by span configs (by virtue of them originating from now-dropped tables/configs). Typically ranges of this sort are short-lived, they're emptied out through GC and merged into LHS neighbors. But if the neighboring ranges are large enough, or load just high enough, the merge does not occur. For such orphaned ranges we were using a hardcoded "fallback config", with a replication factor of three. This made for confusing semantics where ifRANGE DEFAULT
was configured to have a replication factor of five, our replication reports indicated there were under-replicated ranges. This is partly because replication reports today are powered by zone configs (thus looking atRANGE DEFAULT
) -- this will change shortly as part of #89987 where we'll instead consider span config data. In any case, we were warning users of under-replicated ranges but within KV we were not taking any action to upreplicate them -- KV was respecting the hard-coded fallback config. The issues above describe that we should apply each tenant'sRANGE DEFAULT
config to all such orphaned ranges, which is probably the right fix. This was alluded to in an earlier TODO but is still left for future work.So this PR instead takes a shortcut -- it makes the static config configurable through a cluster setting. We can now do the following which alters what fallback config is applied to orphaned ranges, and in our example above, force such ranges to also have a replication factor of five.
Release note: None