Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
71994: spanconfig: introduce spanconfig.Reconciler r=irfansharif a=irfansharif #### spanconfig: introduce spanconfig.Reconciler Reconciler is responsible for reconciling a tenant's zone configs (SQL construct) with the cluster's span configs (KV construct). It's the central engine for the span configs infrastructure; a single Reconciler instance is active for every tenant in the system. ```go type Reconciler interface { // Reconcile starts the incremental reconciliation process from // the given checkpoint. If it does not find MVCC history going // far back enough[1], it falls back to a scan of all // descriptors and zone configs before being able to do more // incremental work. The provided callback is invoked with // timestamps that can be safely checkpointed. A future // Reconciliation attempt can make use of this timestamp to // reduce the amount of necessary work (provided the MVCC // history is still available). // // [1]: It's possible for system.{zones,descriptor} to have been // GC-ed away; think suspended tenants. Reconcile( ctx context.Context, checkpoint hlc.Timestamp, callback func(checkpoint hlc.Timestamp) error, ) error } ``` Let's walk through what it does. At a high-level, we maintain an in-memory data structure that's up-to-date with the contents of the KV (at least the subset of spans we have access to, i.e. the keyspace carved out for our tenant ID). We watch for changes to SQL state (descriptors, zone configs), translate the SQL updates to the flattened span+config form, "diff" the updates against our data structure to see if there are any changes we need to inform KV of. If so, we do, and ensure that our data structure is kept up-to-date. We continue watching for future updates and repeat as necessary. There's only single instance of the Reconciler running for a given tenant at a given point it time (mutual exclusion/leasing is provided by the jobs subsystem). We needn't worry about contending writers, or the KV state being changed from underneath us. What we do have to worry about, however, is suspended tenants' not being reconciling while suspended. It's possible for a suspended tenant's SQL state to be GC-ed away at older MVCC timestamps; when watching for changes, we could fail to observe tables/indexes/partitions getting deleted. Left as is, this would result in us never issuing a corresponding deletion requests for the dropped span configs -- we'd be leaving orphaned span configs lying around (taking up storage space and creating pointless empty ranges). A "full reconciliation pass" is our attempt to find all these extraneous entries in KV and to delete them. We can use our span config data structure here too, one that's pre-populated with the contents of KV. We translate the entire SQL state into constituent spans and configs, diff against our data structure to generate KV updates that we then apply. We follow this with clearing out all these spans in our data structure, leaving behind all extraneous entries to be found in KV -- entries we can then simply issue deletes for. \---- #### server,kvaccessor: record span configs during tenant creation/gc For newly created tenants, we want to ensure hard splits on tenant boundaries. The source of truth for split points in the span configs subsystem is the contents of `system.span_configurations`. To ensure hard splits, we insert a single key record at the tenant prefix. In a future commit we'll introduce the spanconfig.Reconciler process, which runs per tenant and governs all config entries under each tenant's purview. This has implications for this initial record we're talking about (this record might get cleaned up for e.g.); we'll explore it in tests for the Reconciler itself. Creating a single key record is easy enough -- we could've written directly to `system.span_configurations`. When a tenant is GC-ed however, we need to clear out all tenant state transactionally. To that end we plumb in a txn-scoped KVAccessor into the planner where `crdb_internal.destroy_tenant` is executed. This lets us easily delete all abandoned tenant span config records. Note: We get rid of `spanconfig.experimental_kvaccessor.enabled`. Access to spanconfigs infrastructure is already sufficiently gated through COCKROACH_EXPERIMENTAL_SPAN_CONFIGS. Now that crdb_internal.create_tenant attempts to write through the KVAccessor, it's cumbersome to have to enable the setting manually in every multi-tenant test (increasingly the default) enabling some part of the span configs infrastructure. This commit introduces the need for a migration -- for existing clusters with secondary tenants, when upgrading we need to install this initial record at the tenant prefix for all extant tenants (and make sure to continue doing so for subsequent newly created tenants). This is to preserve the hard-split-on-tenant-boundary invariant we wish to provide. It's possible for an upgraded multi-tenant cluster to have dormant sql pods that have never reconciled. If KV switches over to the span configs subsystem, splitting only on the contents of `system.span_configurations`, we'll fail to split on all tenant boundaries. We'll introduce this migration in a future PR (before enabling span configs by default). Release note: None --- Only the last two commits here are of interest (first one is from #73531). 73674: scplan: break out stage building into scstage package and invert rule directions r=postamar a=postamar These 4 commits introduce a new scstage package. This was motivated by the more immediate goal of inverting dependency edge directions, to express them as precedence constraints instead of succession constraints. Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Marius Posta <[email protected]>
- Loading branch information