Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: rework zone configurations to support tenants #67679

Closed
15 of 21 tasks
irfansharif opened this issue Jul 15, 2021 · 2 comments
Closed
15 of 21 tasks

*: rework zone configurations to support tenants #67679

irfansharif opened this issue Jul 15, 2021 · 2 comments
Assignees
Labels
A-multitenancy Related to multi-tenancy A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Jul 15, 2021

This is the tracking issue for #66348. We've been prototyping the what's described in the RFC over at multi-tenant-zcfgs (see all the recent PRs prefixed with [prototype] and the project board here). This is the tracking issue to merge the prototype branch back into master, adding more rigorous testing for it as we do. We expect the work to ~ break down into the following major and minor PRs:

Major:

Minor:

Epic CRDB-2515.

Epic CRDB-10563

@blathers-crl
Copy link

blathers-crl bot commented Jul 15, 2021

Hi @irfansharif, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@irfansharif irfansharif added A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Jul 15, 2021
@irfansharif irfansharif changed the title *: move over span configs prototype to master *: rework zone configurations to support tenants Jul 23, 2021
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 12, 2021
This patch introduces the spanconfigmanager.Manager. This struct is
responsible for setting up the auto span config reconciliation job. The
auto span config reconciliation job is a per tenant job responsible for
reconciling SQL zone configs to KV span configs. This is intended to
be a per tenant singleton, non-cancellable job. The manager ensures
these semantics.

The manager will also provide the job access to components it needs to
perform reconciliation. The skeleton to plumb these components has been
added in this patch but the components themselves will come
subsequently.

References cockroachdb#67679

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 13, 2021
This patch introduces the spanconfigmanager.Manager. This struct is
responsible for providing a hook to idempootently set up an auto span
config reconciliation job which is responsible for reconciling SQL zone
configs to KV span configs. This job is intended to be a per tenant
singleton job that is non-cancellable. The hook the manager exposes
ensures these semantics.

Every SQL pod on startup creates a background task that periodically
calls into this hook. We don't expect these calls to actually be
instantiating a job after the first SQL pod has won the race, yet we do
this periodic thing in the case that there has been an error in the job
that has transitioned it to the failed state. The interval at which
this happens is controlled by the
`sql.span_config_reconciliation_job.idempotent_start_interval` cluster
setting. It defaults to 10 minutes and is private for now.

The manager will also provide the job access to components it needs to
perform reconciliation. The skeleton to plumb these components has been
added in this patch but the components themselves will come
subsequently.

References cockroachdb#67679

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 16, 2021
This patch introduces the spanconfigmanager.Manager. This struct is
responsible for providing a hook to idempootently set up an auto span
config reconciliation job which is responsible for reconciling SQL zone
configs to KV span configs. This job is intended to be a per tenant
singleton job that is non-cancellable. The hook the manager exposes
ensures these semantics.

Every SQL pod on startup creates a background task that periodically
calls into this hook. We don't expect these calls to actually be
instantiating a job after the first SQL pod has won the race, yet we do
this periodic thing in the case that there has been an error in the job
that has transitioned it to the failed state. The interval at which
this happens is controlled by the
`sql.span_config_reconciliation_job.idempotent_start_interval` cluster
setting. It defaults to 10 minutes and is private for now.

The manager will also provide the job access to components it needs to
perform reconciliation. The skeleton to plumb these components has been
added in this patch but the components themselves will come
subsequently.

References cockroachdb#67679

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 17, 2021
Part of cockroachdb#67679. It's not wired up to anything yet; we'll later use this
system-tenant only table to store KV span configs.

Release note (sql change): We've added a `system.span_configurations`
table. This will later be used to store authoritative span configs that
KV has decided to apply.
irfansharif pushed a commit to irfansharif/cockroach that referenced this issue Aug 17, 2021
Part of cockroachdb#67679; these RPCs are exposed through the kvtenant.Connector
interface for tenants and also sit on pkg/server.(*Node) for the host
tenant. This PR also fleshs out the SpanConfig proto type (it's the same
as ZoneConfig, but without any inheritance business). Future PRs will
wire a view of this table into KV with an eye towards replacing our use
of `config.SystemConfig`.

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 18, 2021
This patch introduces the span config subsystem and its orchestrator,
the spanconfig.Manager. The span config manager is owns the process of
reconciling SQL zone configs to KV span configs. This includes managing
the span config reconciliation job and providing it access to
dependencies it needs to perform reconciliation.

The span config reconciliation job is intended to be a per tenant,
forever running job that is non-cancellable. At any point there should
be one (and only one) of these jobs running. The manager helps ensure
these semantics.

Even though we expect this job to be forever running (and therefore
never transition to a failed state), the manager on every sql pod
will periodically ensure that the job is indeed running. It will
restart a new job in case it finds that the job is not running. The
interval at which this happens is dictated by the
`sql.span_config_reconciliation_job.idempotent_start_interval` cluster
setting. It defaults to 10 minutes and is private for now.

This patch only adds the skeleton to encapsulate and plumb dependencies
to the job. The specific components themselves will come in subsequent
patches.

References cockroachdb#67679

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 18, 2021
This patch introduces the span config subsystem and its orchestrator,
the spanconfig.Manager. The span config manager is owns the process of
reconciling SQL zone configs to KV span configs. This includes managing
the span config reconciliation job and providing it access to
dependencies it needs to perform reconciliation.

The span config reconciliation job is intended to be a per tenant,
forever running job that is non-cancellable. At any point there should
be one (and only one) of these jobs running. The manager helps ensure
these semantics.

Even though we expect this job to be forever running (and therefore
never transition to a failed state), the manager on every sql pod
will periodically ensure that the job is indeed running. It will
restart a new job in case it finds that the job is not running. The
interval at which this happens is dictated by the
`sql.span_config_reconciliation_job.idempotent_start_interval` cluster
setting. It defaults to 10 minutes and is private for now.

This patch only adds the skeleton to encapsulate and plumb dependencies
to the job. The specific components themselves will come in subsequent
patches.

References cockroachdb#67679

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 18, 2021
Part of cockroachdb#67679. It's not wired up to anything yet; we'll later use this
system-tenant only table to store KV span configs.

Release note (sql change): We've added a `system.span_configurations`
table. This will later be used to store authoritative span configs that
KV has decided to apply.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 18, 2021
Part of cockroachdb#67679; these RPCs are exposed through the `kvtenant.Connector`
interface for tenants and also sit on `pkg/server.(*Node)` for the host
tenant. The basic type in these RPCs is the `SpanConfig`, which is the
same as our existing `ZoneConfig` proto type but without any inheritance
business.

The RPCs are backed by the `system.span_configurations` system table
added in an earlier commit. Future PRs will wire a view of this table
into KV with an eye towards replacing our use of `config.SystemConfig`.

---

While here, we al introduce a `crdb_internal.pretty_span` builtin to
help with the readability of this table. In future PRs we'll make use of
this built-in for datadriven tests asserting on the state of the table.

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 18, 2021
This patch introduces the span config subsystem and its orchestrator,
the spanconfigmanager.Manager. The span config manager owns the process
of reconciling SQL zone configs to KV span configs. This includes
managing the span config reconciliation job and providing it access to
dependencies it needs to perform reconciliation.

The span config reconciliation job is intended to be a per tenant,
forever running job that is non-cancellable. At any point there should
be one (and only one) of these jobs running. The manager helps ensure
these semantics.

Even though we expect this job to be forever running (and therefore
never transition to a failed state), the manager on every sql pod
will periodically ensure that the job is indeed running. It will
restart a new job in case it finds that the job is not running. The
interval at which this happens is dictated by the
`sql.span_config_reconciliation_job.idempotent_start_interval` cluster
setting. It defaults to 10 minutes and is private for now.

This patch only adds the skeleton to encapsulate and plumb dependencies
to the job. The specific components themselves will come in subsequent
patches.

References cockroachdb#67679

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 18, 2021
This patch introduces the span config subsystem and its orchestrator,
the spanconfigmanager.Manager. The span config manager owns the process
of reconciling SQL zone configs to KV span configs. This includes
managing the span config reconciliation job and providing it access to
dependencies it needs to perform reconciliation.

The span config reconciliation job is intended to be a per tenant,
forever running job that is non-cancellable. At any point there should
be one (and only one) of these jobs running. The manager helps ensure
these semantics.

Even though we expect this job to be forever running (and therefore
never transition to a failed state), the manager on every sql pod
will periodically ensure that the job is indeed running. It will
restart a new job in case it finds that the job is not running. The
interval at which this happens is dictated by the
`spanconfig.reconciliation_job.start_interval` cluster
setting. It defaults to 10 minutes and is private for now.

This patch only adds the skeleton to encapsulate and plumb dependencies
to the job. The specific components themselves will come in subsequent
patches.

References cockroachdb#67679

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 18, 2021
Part of cockroachdb#67679. It's not wired up to anything yet; we'll later use this
system-tenant only table to store KV span configs.

Release note (sql change): We've added a `system.span_configurations`
table. This will later be used to store authoritative span configs that
KV has decided to apply.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 18, 2021
Part of cockroachdb#67679; these RPCs are exposed through the `kvtenant.Connector`
interface for tenants and also sit on `pkg/server.(*Node)` for the host
tenant. The basic type in these RPCs is the `SpanConfig`, which is the
same as our existing `ZoneConfig` proto type but without any inheritance
business.

The RPCs are backed by the `system.span_configurations` system table
added in an earlier commit. Future PRs will wire a view of this table
into KV with an eye towards replacing our use of `config.SystemConfig`.

---

While here, we al introduce a `crdb_internal.pretty_span` builtin to
help with the readability of this table. In future PRs we'll make use of
this built-in for datadriven tests asserting on the state of the table.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 18, 2021
Part of cockroachdb#67679. It's not wired up to anything yet; we'll later use this
system-tenant only table to store KV span configs.

Release note (sql change): We've added a `system.span_configurations`
table. This will later be used to store authoritative span configs that
KV has decided to apply.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 18, 2021
Part of cockroachdb#67679; these RPCs are exposed through the `kvtenant.Connector`
interface for tenants and also sit on `pkg/server.(*Node)` for the host
tenant. The basic type in these RPCs is the `SpanConfig`, which is the
same as our existing `ZoneConfig` proto type but without any inheritance
business.

The RPCs are backed by the `system.span_configurations` system table
added in an earlier commit. Future PRs will wire a view of this table
into KV with an eye towards replacing our use of `config.SystemConfig`.

---

While here, we al introduce a `crdb_internal.pretty_span` builtin to
help with the readability of this table. In future PRs we'll make use of
this built-in for datadriven tests asserting on the state of the table.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 19, 2021
Part of cockroachdb#67679. We'll hide config.SystemConfig behind the
spanconfig.StoreReader interface, and use that instead in the various
queues that need access to the system config span. In future PRs we'll
introduce a data structure that maintains a mapping between spans and
configs that implements this same interface. This will be powered by a
view of `system.span_configurations`, following the ideas described in
\cockroachdb#66348.

When we do make that switch, i.e. have KV consult the new thing for
splits, merges, GC and replication, instead of the gossip backed system
config span, ideally it'd be as easy as swapping the source. This PR
helps pave the way for just that.

In \cockroachdb#66348 we described how zonepb.ZoneConfigs going forward were going
to be an exclusively SQL-level construct. Consequently we purge[*] all
usages of it in KV, storing on each replica a roachpb.SpanConfig
instead.

[*]: The only remaining use is what powers our replication reports,
which does not extend well to multi-tenancy and needs replacing.

Release note: None
craig bot pushed a commit that referenced this issue Aug 20, 2021
69169: sql: allow secondary tenants to set/show zone configurations r=irfansharif,ajwerner a=arulajmani

Part of #67679. The zone configurations themselves have no effect right now. See individual commits for details. 



69185: cli,sql/sem/builtins: emit_defaults to false for decode-proto, pb_to_json r=dt a=ajwerner

When `cockroach debug decode-proto` and `crdb_internal.pb_to_json` were first
added, they emitted the default values in the produced json. This turns out to
not be desirable; setting emit defaults to false produces json which round-
trips back to the same proto.

Release note (cli change): `cockroach debug decode-proto` now does not emit
default values by default.

Release note (sql change): `crdb_internal.pb_to_json` now does not emit
default values by default.

Co-authored-by: arulajmani <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 24, 2021
Part of cockroachdb#67679. It's not wired up to anything yet; we'll later use this
system-tenant only table to store KV span configs.

Release note (sql change): We've added a `system.span_configurations`
table. This will later be used to store authoritative span configs that
KV has decided to apply.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Nov 8, 2021
This patch introduces the SQLWatcher, which is intended to incrementally
watch for updates to system.zones and system.descriptors. It does so by
establishing rangefeeds at a given timestamp.

The SQLWatcher invokes a callback from time to time  with a list of
updates that have been observed in the window
(previous checkpointTS, checkpointTS]. The checkpointTS is also
provided to the callback.

Internally, the SQLWatcher uses a buffer to keep track of events
generated by the SQLWatcher's rangefeeds. It also tracks the individual
frontier timestamps of both the rangefeeds. This helps to maintain the
notion of the combined frontier timestamp, which is computed as the
minimum of the two. This combined frontier timestamp serves as the
checkpoint to the SQLWatcher's callback function.

This interface isn't hooked up to anything yet. It'll be used by the
sponconfig.Reconciler soon to perform partial reconciliation once
full reconciliation is done. It is intended that the IDs from the
updates produced by the SQLWatcher will be fed into the SQLTranslator.

References cockroachdb#67679
Carved from cockroachdb#69661

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Nov 10, 2021
This patch introduces the SQLWatcher, which is intended to incrementally
watch for updates to system.zones and system.descriptors. It does so by
establishing rangefeeds at a given timestamp.

The SQLWatcher invokes a callback from time to time  with a list of
updates that have been observed in the window
(previous checkpointTS, checkpointTS]. The checkpointTS is also
provided to the callback.

Internally, the SQLWatcher uses a buffer to keep track of events
generated by the SQLWatcher's rangefeeds. It also tracks the individual
frontier timestamps of both the rangefeeds. This helps to maintain the
notion of the combined frontier timestamp, which is computed as the
minimum of the two. This combined frontier timestamp serves as the
checkpoint to the SQLWatcher's callback function.

This interface isn't hooked up to anything yet. It'll be used by the
sponconfig.Reconciler soon to perform partial reconciliation once
full reconciliation is done. It is intended that the IDs from the
updates produced by the SQLWatcher will be fed into the SQLTranslator.

References cockroachdb#67679
Carved from cockroachdb#69661

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Nov 10, 2021
This patch introduces the SQLWatcher, which is intended to incrementally
watch for updates to system.zones and system.descriptors. It does so by
establishing rangefeeds at a given timestamp.

The SQLWatcher invokes a callback from time to time  with a list of
updates that have been observed in the window
(previous checkpointTS, checkpointTS]. The checkpointTS is also
provided to the callback.

Internally, the SQLWatcher uses a buffer to keep track of events
generated by the SQLWatcher's rangefeeds. It also tracks the individual
frontier timestamps of both the rangefeeds. This helps to maintain the
notion of the combined frontier timestamp, which is computed as the
minimum of the two. This combined frontier timestamp serves as the
checkpoint to the SQLWatcher's callback function.

This interface isn't hooked up to anything yet. It'll be used by the
sponconfig.Reconciler soon to perform partial reconciliation once
full reconciliation is done. It is intended that the IDs from the
updates produced by the SQLWatcher will be fed into the SQLTranslator.

References cockroachdb#67679
Carved from cockroachdb#69661

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Nov 10, 2021
This patch introduces the SQLWatcher, which is intended to incrementally
watch for updates to system.zones and system.descriptors. It does so by
establishing rangefeeds at a given timestamp.

The SQLWatcher invokes a callback from time to time  with a list of
updates that have been observed in the window
(previous checkpointTS, checkpointTS]. The checkpointTS is also
provided to the callback.

Internally, the SQLWatcher uses a buffer to keep track of events
generated by the SQLWatcher's rangefeeds. It also tracks the individual
frontier timestamps of both the rangefeeds. This helps to maintain the
notion of the combined frontier timestamp, which is computed as the
minimum of the two. This combined frontier timestamp serves as the
checkpoint to the SQLWatcher's callback function.

This interface isn't hooked up to anything yet. It'll be used by the
sponconfig.Reconciler soon to perform partial reconciliation once
full reconciliation is done. It is intended that the IDs from the
updates produced by the SQLWatcher will be fed into the SQLTranslator.

References cockroachdb#67679
Carved from cockroachdb#69661

Release note: None
craig bot pushed a commit that referenced this issue Nov 11, 2021
71968: spanconfig: introduce the spanconfig.SQLWatcher r=irfansharif,ajwerner a=arulajmani

This patch introduces the SQLWatcher, which is intended to incrementally
watch for updates to system.zones and system.descriptors. It does so by
establishing rangefeeds at a given timestamp.

The SQLWatcher periodically invokes a callback with a list of updates
that have been observed in the window
(previous checkpointTS, checkpointTS]. The checkpointTS is also
provided to the callback.

Internally, the SQLWatcher uses a buffer to keep track of events
generated by the SQLWatcher's rangefeeds. It also tracks the individual
frontier timestamps of both the rangefeeds. This helps to maintain the
notion of the combined frontier timestamp, which is computed as the
minimum of the two. This combined frontier timestamp serves as the
checkpoint to the SQLWatcher's callback function.

This interface isn't hooked up to anything yet. It'll be used by the
sponconfig.Reconciler soon to perform partial reconciliation once
full reconciliation is done. It is intended that the IDs from the
updates produced by the SQLWatcher will be fed into the SQLTranslator.

References #67679
Carved from #69661

Release note: None

Co-authored-by: arulajmani <[email protected]>
@irfansharif
Copy link
Contributor Author

irfansharif commented Dec 15, 2021

With the spanconfig.Reconciler (#71994) we've finished the feature work phase of this project, shifting now into hardening mode. Going to track this in #73874.

@irfansharif irfansharif self-assigned this Aug 12, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 7, 2023
In an internal support issue[1] we observed that a mixed-version cluster
straddling 21.2 and 22.1 (where span configs were first introduced)
with a 5x replication factor, the 22.1 nodes were incorrectly
down replicating ranges towards 3x replication. This happened because we
were missing requisite version gates in KV code when deciding what
"source" to use for span configs (choosing between the gossiped
system config span or the span configs infrastructure introduced in
\cockroachdb#67679). The 22.1 nodes were using a fallback, statically hard coded
config with 3x replication factor.

Release note (bug fix): In mixed version clusters running 21.2 and 22.1
nodes, it was possible for CockroachDB to not respect zone configs. This
manifested in a few ways:
- If num_replicas was set to something other than 3, we would still
  add or remove replicas to get to 3x replication.
  - If num_voters was set explicitly to get a mix of voting and
    non-voting replicas, it would be ignored. CockroachDB could possible
    remove non-voting replicas.
- If range_min_bytes or range_max_bytes were changed from 128 MiB and
  512 MiB respectively, we would instead try to size ranges to be within
  [128 MiB, 512MiB]. This could appear as an excess amount of range
  splits or merges, as visible in the Replication Dashboard under "Range
  Operations".
- If gc.ttlseconds was set to something other than 90000 seconds, we
  would still GC data only older than 90000s/25h. If the GC TTL was set
  to something larger than 25h, AOST queries going further back may now
  start failing. For GC TTLs less than the 25h default, clusters would
  observe increased disk usage due to more retained garbage.
- If constraints, lease_preferences or voter_constraints were set, they
  would be ignored. Range data and leases would possibly be moved
  outside where prescribed.
This issues only lasted during the mixed-version state, where the
cluster was not finalized. Finalization of the 22.1 version happens
automatically when all nodes in the cluster are running 22.1. The only
exception to this is if users have set
cluster.preserve_downgrade_option, where auto-finalization is allowed to
proceed once they issue a 'RESET CLUSTER SETTING
cluster.preserve_downgrade_option', or 'SET CLUSTER SETTING version =
22.1' explicitly. Cluster version finalization is better documented on
https://www.cockroachlabs.com/docs/v22.1/upgrade-cockroach-version.

[1]: https://github.com/cockroachlabs/support/issues/2104
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants