Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv,spanconfig: multi-tenant replication reports #89987

Closed
1 of 3 tasks
irfansharif opened this issue Oct 14, 2022 · 3 comments
Closed
1 of 3 tasks

kv,spanconfig: multi-tenant replication reports #89987

irfansharif opened this issue Oct 14, 2022 · 3 comments
Labels
A-multitenancy Related to multi-tenancy A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Oct 14, 2022

Is your feature request related to a problem? Please describe.

Replication reports don't work for secondary tenant keyspans and would not be accessible to secondary tenants even if they did. This is a tracking issue for work to replace today's single-tenant replication reports (https://www.cockroachlabs.com/docs/stable/query-replication-reports.html) with something that works in the multi-tenant context. Rough notes/structure (cribbed from this internal document):

  • KV generates reports off of system.span_configurations state. Scans meta2 in txn-al batches, joins against the span_configurations table (using perhaps the local in-memory copy maintained in spanconfig.KVSubscriber); gets locality information by looking at what the in-memory store pool and in-memory node liveness have to say.
  • Think of this as allocator observability, needs to be efficient enough to be used by the allocator even if it isn't right away. Only RPC used is that to scan meta2 – could also use a cached version of range descriptors in the future.
    • Don’t pre-generate reports — do it on the fly but do it fast. Benchmark against 1M ranges, omit uninteresting data if need be.
    • Sanity check what the current allocator uses for its view of the world
    • Could it be used to do pre-flight checks for decommissioning (#70486)
  • KV-level API generates a protobuf that’s wholly populated. Whether it’s later persisted in some table or not, or exposed through JSON, is irrelevant at this level/pass.
  • Should be able get reports for a list of keyspans. Can be used to power reporting for specific tables, indexes, databases and allow for pagination. Results can be chunked back to the client with resumption keys if needed.
  • Don’t report on lease preferences in the first pass (we don't today though it was described in the original RFC), don’t report on non-voter conformance (do we do this today? #75821?), don’t make it accessible through SQL in the first pass (some lengthy UX questions to be had, for ex. #84090). Don’t try to improve reporting coverage in the face of splits/merges – this is the status quo (best effort reporting).
    • Paper over inconsistency if using multiple txns to sift through meta2.
    • Look at what this comment talks about.

End state: tenants are able to view replication status for their own keyspaces. This is the portion that’s stitched together with their own SQL state. But similar to how only the system tenant (or rather a tenant with system privileges: #85954) is able to configure reserved keyspans or set protected timestamps affecting other tenants, they’ll be able to view status for tenants they’re hosting. All access will be through the Connector interface.

Later on if we want to integrate this more closely with SQL state, above KV we'd have to:

  • Validate that span_configurations is at least as up to date as some recent SQL state. Or wait until it is. We could observe a spanconfig.Reconciler job checkpoint to make sure. Or observe the timestamp in the spanconfig.KVSubscriber too, or pass it down.
  • Span orientation should be such that it’s consumable in SQL to surface data relevant to specific indexes/tables/etc.

Jira issue: CRDB-20534

Epic CRDB-10630

@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-zone-configs A-multitenancy Related to multi-tenancy labels Oct 14, 2022
@irfansharif irfansharif self-assigned this Oct 14, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Oct 14, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 14, 2022
Informs cockroachdb#87503; pure code-movement. Going to use it in future commits as
part of multi-tenant replication reports (cockroachdb#89987) where we'll need to
iterate over the set of range descriptors.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 14, 2022
Informs cockroachdb#87503; pure code-movement. Going to use it in future commits as
part of multi-tenant replication reports (cockroachdb#89987) where we'll need to
iterate over the set of range descriptors.

Release note: None
craig bot pushed a commit that referenced this issue Oct 14, 2022
87938: bazci: move the logic of posting github issues to bazci r=srosenberg a=healthy-pod

This code change moves the logic of filtering tests JSON output files and posting github issues to bazci.

Release note: None
Epic CRDB-15060

89257: upgrades: remove unused struct field r=stevendanna a=stevendanna

Epic: None

Release note: None

89873: cliccl: add `encryption-registry-list` command r=jbowenns a=nicktrav

The existing `enc_util` package contains a tool that could be used to dump the files in an encryption registry. This command has been broken since the file registry format was updated.

Add the `(*PebbleFileRegistry).List` function, that returns a map of files in the registry. Adapt existing test cases.

Add a `debug encryption-registry-list` command that will print all files contained in the registry of an encrypted store. This is useful for debugging which store / data key was used to encrypt each file, replacing the equivalent functionality in `enc_util`.

Touches: #89095.
Epic: None.

Release note (ops change): Adds a new command that can be used by an operator to list the files present in the Encryption-At-Rest file registry.

89988: rangedesciter: carve out library for range desc iteration r=irfansharif a=irfansharif

Informs #87503; pure code-movement. Going to use it in future commits as part of multi-tenant replication reports (#89987) where we'll need to iterate over the set of range descriptors.

Release note: None

Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Nick Travers <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
kvoli pushed a commit to kvoli/cockroach that referenced this issue Oct 17, 2022
Informs cockroachdb#87503; pure code-movement. Going to use it in future commits as
part of multi-tenant replication reports (cockroachdb#89987) where we'll need to
iterate over the set of range descriptors.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 23, 2022
This is KV-side API for multi-tenant replication reports (cockroachdb#89987)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 5, 2022
Informs cockroachdb#87503; adds the ability to only paginate through range
descriptors in the system that overlap with some given span. We're going
to use it in future commits as part of multi-tenant replication reports
(cockroachdb#89987) where we'll need to iterate over the set of range descriptors
that pertain only to some span of interest (for a given index,
partition, tenant, etc.)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 7, 2022
This is KV-side API for multi-tenant replication reports (cockroachdb#89987)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 23, 2022
Informs cockroachdb#87503; adds the ability to only paginate through range
descriptors in the system that overlap with some given span. We're going
to use it in future commits as part of multi-tenant replication reports
(cockroachdb#89987) where we'll need to iterate over the set of range descriptors
that pertain only to some span of interest (for a given index,
partition, tenant, etc.)

Release note: None
craig bot pushed a commit that referenced this issue Nov 23, 2022
91330: rangedesciter: support scoped iteration r=irfansharif a=irfansharif

Informs #87503; adds the ability to only paginate through range descriptors in the system that overlap with some given span. We're going to use it in future commits as part of multi-tenant replication reports (#89987) where we'll need to iterate over the set of range descriptors that pertain only to some span of interest (for a given index, partition, tenant, etc.)

Release note: None.

92253: sql/sem: add support for "pg_blocking_pids" builtin r=rafiss a=girishv-tw

added support pg_blocking_pids builtin. Needed for pgadmin.

Closes #91068

Release note (sql change): add support for pg_blocking_pids builtin function. It is hardcoded to return an empty array because CockroachDB has no equivalent concept of PIDs as in Postgres.

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: GirishV <[email protected]>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 24, 2022
This is KV-side API for multi-tenant replication reports (cockroachdb#89987)

Release note: None
craig bot pushed a commit that referenced this issue Nov 25, 2022
90016: spanconfig: introduce spanconfig.Reporter r=irfansharif a=irfansharif

This is KV-side API for multi-tenant replication reports (#89987).

Co-authored-by: irfan sharif <[email protected]>
@irfansharif
Copy link
Contributor Author

#90016 merged and completes the KV-side API for multi-tenant replication reports. The next step is to work on some “front-end” for this API, either just a vanilla HTTP end point, or some dedicated SQL syntax to present the results of this API in a more readable way (SHOW REPLICATION STATUS FOR [EVERYTHING | DB <db name> | TABLE <tbl name> | INDEX <idx name>] for example). Unassigning myself.

@irfansharif irfansharif removed their assignment Nov 25, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 25, 2022
For this setting type:
- the protoutil.Message is held in memory,
- the byte representation is stored in system.settings, and
- the json representation is used when accepting input and rendering
  state (through SHOW CLUSTER SETTING <setting-name>, the raw form is
  visible when looking directly at system.settings)

We also use this setting type to support power a
spanconfig.store.fallback_config_override, which overrides the fallback
config used for ranges with no explicit span configs set. Previously we
used a hardcoded value -- this makes it a bit more configurable. This is
a partial and backportable workaround (read: hack) for cockroachdb#91238 and
\cockroachdb#91239. In an internal escalation we observed "orphaned" ranges from
dropped tables that were not being being referenced by span configs (by
virtue of them originating from now-dropped tables/configs). Typically
ranges of this sort are short-lived, they're emptied out through GC and
merged into LHS neighbors. But if the neighboring ranges are large
enough, or load just high enough, the merge does not occur. For such
orphaned ranges we were using a hardcoded "fallback config", with a
replication factor of three. This made for confusing semantics where if
RANGE DEFAULT was configured to have a replication factor of five, our
replication reports indicated there were under-replicated ranges. This
is partly because replication reports today are powered by zone configs
(thus looking at RANGE DEFAULT) -- this will change shortly as part of
\cockroachdb#89987 where we'll instead consider span config data.
In any case, we were warning users of under-replicated ranges but within
KV we were not taking any action to upreplicate them -- KV was
respecting the hard-coded fallback config. The issues above describe
that we should apply each tenant's RANGE DEFAULT config to all such
orphaned ranges, which is probably the right fix. This was alluded to in
an earlier TODO but is still left for future work.

  // TODO(irfansharif): We're using a static[1] fallback span config
  // here, we could instead have this directly track the host tenant's
  // RANGE DEFAULT config, or go a step further and use the tenant's own
  // RANGE DEFAULT instead if the key is within the tenant's keyspace.
  // We'd have to thread that through the KVAccessor interface by
  // reserving special keys for these default configs.
  //
  // [1]: Modulo the private spanconfig.store.fallback_config_override, which
  //      applies globally.

So this PR instead takes a shortcut -- it makes the static config
configurable through a cluster setting. We can now do the following
which alters what fallback config is applied to orphaned ranges, and in
our example above, force such ranges to also have a replication factor
of five.

  SET CLUSTER SETTING spanconfig.store.fallback_config_override = '
    {
      "gcPolicy": {"ttlSeconds": 3600},
      "numReplicas": 5,
      "rangeMaxBytes": "536870912",
      "rangeMinBytes": "134217728"
    }';

Release note: None
craig bot pushed a commit that referenced this issue Nov 28, 2022
92466: settings,spanconfig: introduce a protobuf setting type r=irfansharif a=irfansharif

For this setting type:
- the `protoutil.Message` is held in memory,
- the byte representation is stored in `system.settings`, and
- the json representation is used when accepting input and rendering state (through `SHOW CLUSTER SETTING <setting-name>`, the raw form is visible when looking directly at `system.settings`)

We also use this setting type to support power a `spanconfig.store.fallback_config_override`, which overrides the fallback config used for ranges with no explicit span configs set. Previously we used a hardcoded value -- this makes it a bit more configurable. This is a partial and backportable workaround (read: hack) for #91238 and \#91239. In an internal escalation we observed "orphaned" ranges from dropped tables that were not being being referenced by span configs (by virtue of them originating from now-dropped tables/configs). Typically ranges of this sort are short-lived, they're emptied out through GC and merged into LHS neighbors. But if the neighboring ranges are large enough, or load just high enough, the merge does not occur. For such orphaned ranges we were using a hardcoded "fallback config", with a replication factor of three. This made for confusing semantics where if `RANGE DEFAULT` was configured to have a replication factor of five, our replication reports indicated there were under-replicated ranges. This is partly because replication reports today are powered by zone configs (thus looking at `RANGE DEFAULT`) -- this will change shortly as part of \#89987 where we'll instead consider span config data. In any case, we were warning users of under-replicated ranges but within KV we were not taking any action to upreplicate them -- KV was respecting the hard-coded fallback config. The issues above describe that we should apply each tenant's `RANGE DEFAULT` config to all such orphaned ranges, which is probably the right fix. This was alluded to in an earlier TODO but is still left for future work.

```go
  // TODO(irfansharif): We're using a static[1] fallback span config
  // here, we could instead have this directly track the host tenant's
  // RANGE DEFAULT config, or go a step further and use the tenant's own
  // RANGE DEFAULT instead if the key is within the tenant's keyspace.
  // We'd have to thread that through the KVAccessor interface by
  // reserving special keys for these default configs.
  //
  // [1]: Modulo the private spanconfig.store.fallback_config_override, which
  //      applies globally.
```

So this PR instead takes a shortcut -- it makes the static config configurable through a cluster setting. We can now do the following which alters what fallback config is applied to orphaned ranges, and in our example above, force such ranges to also have a replication factor of five.

```sql
  SET CLUSTER SETTING spanconfig.store.fallback_config_override = '
    {
      "gcPolicy": {"ttlSeconds": 3600},
      "numReplicas": 5,
      "rangeMaxBytes": "536870912",
      "rangeMinBytes": "134217728"
    }';
```

Release note: None


92585: roachtest: reduce frequency of index-overload test r=irfansharif a=irfansharif

Release note: None

92586: jobs: refresh job details before removing protected timestamps r=fqazi a=fqazi

Fixes: #92493

Previously, we added the protected timestamps manager into the jobs frame work, which made it easier to automatically add and remove protected timestamps for jobs. Unfortunately, the Unprotect API when invoked from a clean up function never properly refreshed the job. Which could lead to a race condition trying to remove protected timestamps for schema changes. To address this, the Unprotect function will take advantage of the job update function to confirm that a refreshed copy does have protected timestamps set.

Release note: None

92593: logictest: avoid a lint failure r=andreimatei a=knz

Fixes #92592

The go compiler treats calls to `bazel.BuiltWithBazel()` as a
compile-time constant. Therefore,

```go
if !bazel.BuiltWithBazel() {
   skip.XXX()
}
```

is considered to always terminate execution (because `skip` does its
job by raising a panic), and any code coming after that is treated as
dead/unreachable.

92599: multitenant: merge serverpb.RegionsServer into serverpb.TenantStatusServer r=knz a=ecwall

Fixes #92598

Merge these 2 interfaces in anticipation for future work where more tenant functionality will be added to TenantStatusServer instead of creating an interface per function.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
@irfansharif
Copy link
Contributor Author

The next step is to work on some “front-end” for this API, either just a vanilla HTTP end point, or some dedicated SQL syntax to present the results of this API in a more readable way (SHOW REPLICATION STATUS FOR [EVERYTHING | DB | TABLE | INDEX ] for example).

Going to leave this, and building parity for critical localities, as things left to do under https://cockroachlabs.atlassian.net/browse/CRDB-10630. I'll close this issue specifically.

@knz
Copy link
Contributor

knz commented Mar 30, 2023

Moving the convo over to here: #100004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy A-zone-configs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants