Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow ALTER DATABASE PRIMARY REGION to work on system tenants #119662

Conversation

jasminejsun
Copy link
Contributor

Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE PRIMARY REGION to work on system tenants.

Fixes: #63365
Epic: CRDB-33032

Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jasminejsun jasminejsun force-pushed the allow_alter_database_primary_region branch 2 times, most recently from 6bcab57 to ba64ab4 Compare February 27, 2024 16:12
@jasminejsun jasminejsun force-pushed the allow_alter_database_primary_region branch 2 times, most recently from 7f9b7c6 to 6647c0f Compare March 5, 2024 16:23
@jasminejsun jasminejsun force-pushed the allow_alter_database_primary_region branch 2 times, most recently from 8a9c115 to ce04b6e Compare March 18, 2024 19:21
lyang24 and others added 23 commits March 18, 2024 15:29
This commit replaces Memstats collection with go runtime metrics. The latter
approach does not require stw to collect memory stats.

Subtle changes to be noted: statsProfiler still call runtime.ReadMemStats on
spot when it determines memory stats should be saved. The previous behavior was
to save with the Memstats that was refreshed periodically. GoStatsStaleness is
deprecated in health events.

Fixes: cockroachdb#119461

Release note: None
The fields within the RangeCache are all independent today, but it
often interacts with a RangeInfo. Being able to directly convert to one
is useful.

Epic: none

Release note: None
The setup steps for this test sometimes take a while and end up hitting
the timeout. This patch increases the timeout to avoid the setup
slowness from causing a test failure.

Fixes cockroachdb#119842.

Release note: None
* Skip heavy `TestBatchJobsCreation` tests under `deadlock` as well
* Run `security` tests under `large` for `deadlock`/`race`

Epic: CRDB-8308
Release note: None
This patch adds an ImportEpoch field to an MVCCValue's MVCCValueHeader,
which allows KV clients (namely the sst_batcher in an IMPORT INTO) to write
the importing table's ImportEpoch to the metadata of each ingesting MVCCValue.

Unlike the MVCCValueHeader.LocalTimestamp field, the ImportEpoch field
should be exported to other clusters (e.g. via ExportRequests from
BACKUP/RESTORE and streaming). Consequently, this PR relaxes the
invariant that the MVCCValueHeader field must be stripped in an
ExportRequest and must be empty in an AddSSTable Request. Now,
ExportRequest emits MVCCValueHeaders with ImportEpoch set if it was
set in the original value and AddSSTable only requires the
LocalTimestamp to be empty.

Epic: none
Release note: none

Co-authored-by: Steven Danna <[email protected]>
This patch makes IMPORT INTO on a non-empty table write the table's
ImportEpoch to each ingested MVCC Value, via the SSTBatcher. In a
future PR, the ImportEpoch will be used rollback an IMPORT INTO in
some cases.

This additional information will allow IMPORTing tables to be backed
up and restored.

As part of this change we now also assume we might see an MVCCValue
during restore.

* Version Gating

Previously, callers could (and did) assume that the values present in
the SSTs returned by export request could be interpreted directly as
roachpb.Value objects using code like:

    roachpb.Value{RawBytes: valBytes}

For MVCCValueHeaders to be exported by ExportRequest all callers need
to be updated:

1. ExportRequest on system.descriptors in sql/catalog/lease
2. ExportRequest on system.descriptors in ccl/changefeedccl/schemafeed
3. ExportRequest used by `FINGERPRINT`
4. ExportRequest used by old binaries in a mixed-version cluster.

(1) and (2) will be easy to update and likely don't matter in practice
moment as those tables do not include values with exportable value
headers at the moment.

(3) will be easy to update, but we still need an option to exclude
value headers (a) until value headers are included in rangefeeds
and (b) so long as we want to compare fingerprints with 23.2 versions.

(4) is impossible to update so if we want BACKUP/RESTORE to round-trip
in mixed version cluster we must version gate including them in
backups until the cluster is on a single version.

To account for this we only increment ImportEpoch during IMPORTs that
start on 24.1 or greater and we only request MVCCValueHeaders on
BACKUPs that start on 24.1 or greater.

The first condition is important to ensure that we can later detect a
table that can be fully rolled back using the new rollback method.

Note that this also marks a hard backward incompatibility for backup
artifacts. Backups for 24.1 cannot be restored on 23.2 or older. This
was already the case by policy. 23.2 backups should still work fine on
24.1 since all roachpb.Value's should properly decode as MVCCValue's.

Informs cockroachdb#76722

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Release note (enterprise change): default_target_cluster can now be set to any tenant name by default, including a tenant yet to be created or have service started.
Epic: none.
The mean over slice function is useful for other raochtests as well.

Release note: None
Similar to cockroachdb#114446, we now
take the mean over the last two minutes for determining high L0 sublevel
count.

Fixes cockroachdb#119838.

Release note: None
We recently fixed an issue where we forgot to stop the index backfill
merger monitor, but we had a minor bug in that fix - we captured the
context that contains the tracing span that is finished before the
account is closed leading to "use after finish" assertions. This is now
fixed.

Release note: None
The `default` pool seems to be too small to perform linking efficiently.
This should speed things up.

Epic: CRDB-8308

Release note: None
These tests specifically are prone to failing/timing out under `race`.

Epic: CRDB-8308
Release note: None
…j=cpu

Fixes cockroachdb#120163.

Avoids rare test flakes.

Release note: None
Changes:

 * [`51faab0a`](cockroachdb/pebble@51faab0a) tool: add DirectoryLock option
 * [`ec69e9a2`](cockroachdb/pebble@ec69e9a2) ingest test: fix merge skew
 * [`635c6003`](cockroachdb/pebble@635c6003) manifest: add VersionEdit tests with virtual tables
 * [`cb660884`](cockroachdb/pebble@cb660884) manifest: improve VersionEdit stringification
 * [`31b37248`](cockroachdb/pebble@31b37248) ingest_test: support reopening in ingest tests
 * [`64ebec94`](cockroachdb/pebble@64ebec94) ingest_test: set correct sizes for external ingests
 * [`4bf09d5e`](cockroachdb/pebble@4bf09d5e) manifest: improve VersionEdit tests
 * [`a034560d`](cockroachdb/pebble@a034560d) manifest: add a helper for DebugString parsing
 * [`623524f1`](cockroachdb/pebble@623524f1) manifest: use Lx for levels in version String/DebugString

Release note: none.
Epic: none.
This was previously disabled in
06eaaa0.

Now that the bug in cockroachdb#116649
has been resolved, we can enable it again.

Release note: None
Previously, the test was failing because when we were checking if the cache entry existed the second time, we were not assigning the potentially existing entry to cacheEntry.
Thus, it was possible for cacheEntry to be nil, causing runtime error: invalid memory address or nil pointer dereference.

Epic: none
Fixes: cockroachdb#120173

Release note: None
Previously, all of the DRT scripts were either stored in private gists, or on
the workload node of the cluster. This commit properly stores the collection of
scripts in the repo.

Release note: None

Epic: none
Release note: none.
Epic: none.
Changes:

 * [`d938cdc6`](cockroachdb/pebble@d938cdc6) tool: replace DirectoryLock with OpenOptions

Release note: none.
Epic: none.
This has been nonsensical since 5ba5801.

Epic: None
Release note: None
This commit switches interactions with the lastUpdateTimes map from using
`timeutil.Now` directly to using `hlc.Clock.PhysicalTime`. This allows the
lastUpdateTimes interactions to properly respond to mocked clocks in tests.

Epic: None
Release note: None
Fixes cockroachdb#120375.
Fixes cockroachdb#82059.

This commit deflakes TestWedgedReplicaDetection by using a manual clock.
This prevents slowdowns in the test environment from causing the test to
fail.

Epic: None
Release note: None
rafiss and others added 27 commits March 18, 2024 15:29
…rpk-bank

The change in 79219f7 added an upper
bound to automatic retries performed in the db.Txn runner.

This test was affected. It runs an ALTER PRIMARY KEY while DML is
happenning on the table concurrently. Increasing the retry count makes
it pass again.

Another approach may have been to treat the "Terminating retry loop due
to too many retries" error as itself retryable at the job runner level.
But that seems a bit harder to reason about, and it would be a larger
behavior change that affects all schema change jobs.

Release note: None
Productionize the EngFlow `race` and `deadlock` nightlies and turn off
the TeamCity trigger.

Epic: CRDB-8308
Release note: None
This change adds a queue implementation. This queue is implemented as a linked list.
This linked list uses chunking to amortize the overhead of linked list
elements.

This implementation is partially copied from `pkg/ccl/changefeedccl/kvevent/chunked_event_queue.go`.
A TODO is left to migrate that queue to use the generic implementation. Tests are copied
as well. The original implementation has been around for over 1 year and has been tested
at high scales (millions of elements, throughput of 100k items / sec, handling 100s of MBs
of data flowing through). The implementation is efficient in terms of memory and CPU usage.

The purpose of this change is to introduce a datastructure which scales and resizes
efficiently specifically for cockroachdb#109667. There is no existing implementation which does
so except for `chunked_event_queue.go`. Unfortunately, that queue does not support
generic types, so this change makes it generic.

Release note: None
Epic: None
Previously, `roachprod` cluster ssh keys were managed via the cloud provider.
This was recently updated, and now it is required to have an ssh key available
that will be placed on the cluster via its start-up script.

See: cockroachdb#119106

Epic: None
Release Note: None
Changes:

 * [`83f89c67`](cockroachdb/pebble@83f89c67) db: allow Options.Lock's dirname to be equivalent path

Release note: none.
Epic: none.
Decouple the initialization of the virtual filesystem (including
encryption-at-rest, disk-health monitoring, etc) from the opening of the
storage engine. This refactor is in preparation for adding configuration
settings for WAL failover. When configuring multi-store WAL failover, we'll
need access to more filesystems than just the storage engine's primary
filesystem (we need the secondary's FS as well). Decoupling initialization will
allow us to first initialize all the filesystems and ensure we have all FSes
available by the time we open engines.

Informs: cockroachdb#119418
Epic: CRDB-35401

Release note: none
This commit reduces the size of the `ScanFlags` struct by grouping
boolean fields together. This reduces the extra padding the Go compiler
must add to keep fields properly byte-aligned.

Release note: None
Uses of DistributionTypeSystemTenantOnly represent places where UA/MT
performance may not match single-tenant performance.

This replaces all uses of DistributionTypeSystemTenantOnly with
DistributionTypeAlways.

Epic: none

Fixes cockroachdb#116534
This commit does the following renaming to better represent the options:
- `DistributionTypeNone` -> `LocalDistribution`
- `DistributionTypeAlways` -> `FullDistribution`.

Release note: None
Closes cockroachdb#115164.

This commit introduces a new `--go-gc-percent` flag to CLI `start` command which
controls the garbage collection target percentage on the Go runtime. The default
value for the flag if neither the --go-gc-percent flag nor the GOGC env var is
set is 300%.

To avoid introducing new OOMs, we only switch to this new default (up from 100%)
if a soft memory limit is set, either manually or automatically.

Release note (cli change): A new flag `--go-gc-percent` is introduced to `start`
command. It controls the garbage collection target percentage on the Go runtime,
mirroring the existing GOGC environment variable. A garbage collection is
triggered when the ratio of freshly allocated data to live data remaining after
the previous collection reaches this percentage. If left unspecified and a Go soft
memory limit is configured (i.e. not explicitly disabled via --max-go-memory nor
GOMEMLIMIT), the GC target percentage defaults to 300%. If set to a
negative value, disables the target percentage garbage collection heuristic,
leaving only the soft memory limit heuristic to trigger garbage collection.
There is 1 test in this package, so having 16 shards makes no sense.

Epic: CRDB-8308
Release note: None
This commit hardens the backup's retry loop by resetting the retry counter on
meaningful progress. In details, The retry loop will be reseted if the backup
job encountered a retriable error and it have made at least 1% more progress
than the previous run.

Fixes: cockroachdb#119691

Release note: None
This commit improves TestBackupJobRetryReset's reliability and coverage area.

Epic: None

Release note: None
This required also making sure the tests that use different isolation
levels can pass the license check.

Release note (enterprise change): The READ COMMITTED isolation level now
requires the cluster to have a valid enterprise license.
This PR adds and exports prometheus success and error
counters for the schemachange workload. This will enable
us to monitor for success/errors while running the workload
on long running clusters with the tolerate-errors flag set
to true.

Epic: none
Fixes: cockroachdb#117701
Release note: None
This commit moves out some procedure related parameter tests from
`udf_params` into `procedure_params` in anticipation of adding more
test cases. This is purely a mechanical change.

Release note: None
This commit extracts the logic of setting different ScalarAncestors bits
when type checking the FuncExpr to be done only when type checking the
sub-expressions. I believe this was the original intention, but more
concretely this allows us to produce the correct error message when no
suitable overload is found for a procedure (previously, we would already
have `FuncExprAncestor` set when calling `procedureDoesNotExistErr` when
we got zero overloads).

Additionally, it moves the check for zero overloads found up, to
a slightly more appropriate place, as well as adjusts the error message
for zero overloads to be closer to postgres.

Release note: None
This commit removes one copy of the parameter naming code that handles
the default names. We need to have this code in three places:
- in the optbuilder when handling CREATE FUNCTION
- in the new schemachanger (because it re-parses the stmt)
- in the test catalog.

The extra copy was when creating the overload and is redundant because
the correct return type is already stored in the descriptor on disk.

Additionally, this commit adjusts the schemachanger logic to come up
with output-parameter based return type for procedures (exercised in the
following commit).

Release note: None
…al SSD

In [1], we introduced falling back to `c6a` (AMD Milan) in
`SelectAWSMachineType`, when requested number of vCPUs > 80.
However, that family type doesn't support local SSDs.

Thus, when `shouldSupportLocalSSD=true` is requested,
we now ignore it. We also bump `EstimatedMaxGCE`
and `EstimatedMaxAWS` (both empirically derived)
for `tpccbench/nodes=9/cpu=4/multi-region` in order
to reduce the number of steps during the line search.
Otherwise, the test has been seen timing out, owing
largely in part due to being executed on Ice Lake vs.
Cascade Lake (prior to [1]).

[1] cockroachdb#117852

Epic: none

Release note: None
In statistics builder, we have a buildTopK but were missing a
colStatTopK. Add one based on colStatLimit. We currently only add TopK
expressions during exploration, so this is never called anyway, but if
we ever add TopK during optbuild or normalization this would be
called. Seems better to have it rather than hit the assertion in
colStat.

Epic: None

Release note: None
Stress tests of the tenant-backup nemesis have shown that there
is a problem when writing ImportEpochs.

While we debug this problem, disabling the writing of ImportEpochs
should stop the test from failing.

Epic: none
Release note: none
Release note: none.
Epic: none.
EncodeMVCCValueToBuf has a fast-path that directly returns RawBytes if
the MVCCValueHeader is empty. This fast path is rather important.

However, the API encouraged callers to retain the returned buffer. But
in the case of returning RawBytes, that buffer may not be safe to
retain.

Fixes cockroachdb#120442

Release note: None
Epic: None
Release Note: None
Previously, on a multi-region setup the system database could not be modified to be multi-region and it was blocked from being made multi-region aware. To address this, we are now allowing ALTER DATABASE PRIMARY REGION to work on system tenants.

Fixes: cockroachdb#63365
Epic: CRDB-33032

Release note (sql change): Previously, we added support for settings reegion on the system database, which was limited to tenants only. We lifted this limitation to allow ALTER DATABASE PRIMARY REGION to work on system tenants.
@jasminejsun jasminejsun force-pushed the allow_alter_database_primary_region branch from ce04b6e to d9d6a44 Compare March 18, 2024 21:17
Copy link

blathers-crl bot commented Mar 18, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql,*: make some or all system tables LOCALITY GLOBAL